test(go/ai/prompt): add tests for template variable substitution#3925
Conversation
3b4f001 to
bf10054
Compare
|
Hi @Zereker We are making improvements in the core which are causing merge conflicts with your contribution. Would it be possible if you address the conflicts? |
bf10054 to
64ef676
Compare
|
Hi @hugoaguirre, I've rebased on the latest main and resolved the conflicts. Ready for review! |
|
Hi @Zereker, I've tried to reproduce the issue with the latest changes in
Genkit code: func PromptFromZereker(ctx context.Context, g *genkit.Genkit) {
prompt := genkit.LoadPrompt(g, "./prompts/greeting.prompt", "greetings")
if prompt == nil {
log.Fatal("empty prompt")
}
resp, err := prompt.Execute(ctx,
ai.WithInput(map[string]any{
"name": "Alice",
"place": "Wonderland",
}))
if err != nil {
log.Fatalf("error executing prompt: %v", err)
}
fmt.Printf("request: %#v\n", resp.Request.Messages[0].Text())
log.Print(resp.Text())
}Output: Could you point me in the right direction to reproduce the issue you are reporting? You can copy paste this sample in |
|
Hi @hugoaguirre, Thanks for testing! I found that the
However, there's an issue with how this syntax is handled in How to ReproduceAdd this test to func TestHandlebarsRoleMarkers(t *testing.T) {
tempDir := t.TempDir()
mockPromptFile := filepath.Join(tempDir, "test.prompt")
content := `---
model: test/chat
---
{{role "system"}}
You are a helpful assistant.
{{role "user"}}
Hello {{name}}, welcome to {{place}}!
`
if err := os.WriteFile(mockPromptFile, []byte(content), 0644); err != nil {
t.Fatal(err)
}
prompt := LoadPrompt(registry.New(), tempDir, "test.prompt", "test")
opts, err := prompt.Render(context.Background(), map[string]any{
"name": "Alice",
"place": "Wonderland",
})
if err != nil {
t.Fatal(err)
}
// Verify messages are correctly separated
if len(opts.Messages) != 2 {
t.Errorf("Expected 2 messages, got %d", len(opts.Messages))
}
if opts.Messages[0].Role != RoleSystem {
t.Errorf("Expected first message to be system role")
}
}Expected: Actual (on main): ComparisonThe existing test
Root CauseIn dpMessages, err := dotprompt.ToMessages(parsedPrompt.Template, &dotprompt.DataArgument{})This renders the template at load time with an empty My FixMy fix defers template rendering to execution time by using
This ensures both template variables ( |
|
Hi @Zereker, |
|
Note: This is resolved by syncing to the latest version of the main branch at cd3835a but I'm leaving it here in case it's helpful I'm bumping this rather than reporting a separate issue. This should resolve another issue which I don't see explicitly reported where rendering a prompt multiple times with different inputs does not update the rendered template after the first input. How to Reproduce// TestPromptMultipleRenders tests that the prompt correctly handles
// multiple sequential renders with different inputs, ensuring each render uses
// the correct input value.
func TestPromptMultipleRenders(t *testing.T) {
ctx := context.Background()
tempDir := t.TempDir()
mockPromptFile := filepath.Join(tempDir, "test.prompt")
content := `---
model: test/chat
input:
schema:
input: string
---
Here is the input: {{input}}
`
if err := os.WriteFile(mockPromptFile, []byte(content), 0644); err != nil {
t.Fatal(err)
}
g := genkit.Init(ctx, genkit.WithPromptDir(tempDir))
prompt := genkit.LookupPrompt(g, "test")
if prompt == nil {
t.Fatal("Prompt 'test.prompt' not found")
}
// Test multiple sequential renders with different inputs
inputs := []string{
"input-test-abc-1",
"input-test-def-2",
"input-test-ghi-3",
}
for i, input := range inputs {
inputMap := map[string]any{
"input": input,
}
actionOpts, err := prompt.Render(ctx, inputMap)
if err != nil {
t.Fatalf("Failed to render prompt with input %d (%q): %v", i+1, input, err)
}
if actionOpts == nil {
t.Fatalf("Render() returned nil action options for input %d", i+1)
}
if len(actionOpts.Messages) == 0 {
t.Fatalf("Render() returned no messages for input %d", i+1)
}
// Collect all text
var renderedText strings.Builder
for _, msg := range actionOpts.Messages {
for _, part := range msg.Content {
if part.IsText() {
renderedText.WriteString(part.Text)
renderedText.WriteString(" ")
}
}
}
text := renderedText.String()
// Verify current input appears
if !strings.Contains(text, input) {
t.Errorf("Input %d (%q) not found in render %d. Text snippet: %q", i+1, input, i+1, text[:min(200, len(text))])
}
// Verify previous inputs do NOT appear
for j, prevInput := range inputs {
if j < i && strings.Contains(text, prevInput) {
t.Errorf("BUG: Previous input %d (%q) found in render %d when it should only contain input %d (%q).",
j+1, prevInput, i+1, i+1, input)
}
}
}
}Expected: Actual: |
|
Hi @mcicoria, thanks for bumping this and providing the detailed reproduction case! The issue you reported (multiple renders always using the first input) is caused by a template sharing bug in the dotprompt After updating genkit to use the latest dotprompt version ( Note: This PR (#3925) addresses a different issue — the |
64ef676 to
24977e3
Compare
Update: Rebased and SimplifiedThis PR has been rebased on the latest Why This Test is Still ValuableThe existing tests in #4035 verify that:
However, none of them test the core scenario of issue #3924:
Our // First render
actionOpts1, _ := prompt.Render(ctx, map[string]any{"name": "Alice", "place": "Wonderland"})
// Second render with DIFFERENT values
actionOpts2, _ := prompt.Render(ctx, map[string]any{"name": "Bob", "place": "Paradise"})
// Critical assertion: second render must NOT contain first render's values
if strings.Contains(text2, "Alice") {
t.Errorf("BUG: Second render contains 'Alice' from first input!")
}This test ensures that if someone accidentally regresses the fix (e.g., pre-rendering templates at load time), the test will catch it immediately. Test Coverage
I believe this regression test adds value as a safeguard against future regressions of issue #3924. |
|
@hugoaguirre Could you please approve the workflow runs when you have a chance? This PR adds a regression test for the template variable substitution fix. Thanks! |
24977e3 to
1c28004
Compare
|
Hi @Zereker thanks for taking your time on adding this regression test! I missed this PR since I was working on some plugin changes. I'll get back to you with my review 👍🏽 |
hugoaguirre
left a comment
There was a problem hiding this comment.
LGTM, we should cover the {{ role %s}} syntax as well
1c28004 to
f2ae1e7
Compare
…tution Add tests to ensure template variables are properly substituted at execution time, not load time. This is a regression test for firebase#3924 Test coverage includes: - single role: basic template variable substitution - multi role: {{role "..."}} Handlebars syntax with template variables
f2ae1e7 to
e520663
Compare
Co-authored-by: Zereker <Zereker@users.noreply.github.com>
Summary
LoadPromptwhere variables were replaced with empty values at load timeWithMessagesFnconvertDotpromptMessageshelper functionTestLoadPromptTemplateVariableSubstitutionProblem
When using
LoadPromptto load.promptfiles, the template was rendered at load time with an emptyDataArgument. This caused all template variables (like{{name}},{{topic}}, etc.) to be replaced with empty values immediately.As a result, subsequent calls to
Execute()orRender()with actual input values had no effect - the template was already "baked" with empty values.Example
Solution
Defer template rendering to execution time by using
WithMessagesFn. The closure:<<<dotprompt:role:XXX>>>markers)<<<dotprompt:history>>>markers)Test Plan
TestLoadPromptTemplateVariableSubstitutionregression testTestMultiMessagesRenderPromptstill passes (multi-role support)aipackage tests passFixes #3924