Skip to content

Conversation

@justcodebruh
Copy link
Contributor

@justcodebruh justcodebruh commented Dec 19, 2025

Summary

Fixes a bug where metadata parameters from dataset records were not being passed through to prompt templates when running experiments via the Python SDK, even though they worked correctly in the UI.

Problem

When using init_function() to create a task for Eval(), dataset records containing metadata fields were not being propagated to the prompt templates. This caused mustache template variables like {{metadata.fieldName}} to render as empty strings instead of their actual values.

Example of the issue:

# Dataset record with metadata
{
  "input": {"name": "John", "topic": "AI"},
  "metadata": {"test0": 99, "test1": 42},
  "expected": "Expected output"
}

# Prompt template
"Tell me about {{metadata.test0}}. And end with a salutation to {{metadata.test1}}"

# Result BEFORE fix: "Tell me about . And end with a salutation to "
# Result AFTER fix: "Tell me about 99. And end with a salutation to 42"

Root Cause

The init_function() wrapper was not extracting metadata from the hooks parameter that the evaluation framework provides when calling task functions. The framework passes a DictEvalHooks object containing the metadata, but init_function() was only forwarding the input data to invoke().

Solution

Modified init_function() to:

  1. Accept exactly 2 parameters (input, hooks) to match framework requirements for passing hooks
  2. Extract metadata from hooks.metadata when available
  3. Pass the metadata to the invoke() call for proper template substitution
  4. Maintain backward compatibility for both task and scorer modes

Changes

  • Modified function signature from f(*args, **kwargs) to f(input, hooks=None)
  • Added logic to detect and handle the hooks object vs other parameter types
  • Extracts metadata and passes it to invoke() for both task and scorer modes

Test Plan

  1. Created test script that reproduces the issue:
# Test with a prompt that uses metadata variables
prompt_ref = init_function("project", "prompt-with-metadata")
dataset_ref = init_dataset(project="project", name="dataset-with-metadata")

# Run eval
await braintrust.EvalAsync(
    name="test",
    task=prompt_ref,
    data=dataset_ref,
    scores=[scorer_ref]
)
  1. Verified the fix with direct invocation:
class FakeHooks:
    def __init__(self):
        self.metadata = {"test0": 99, "test1": 42}

hooks = FakeHooks()
result = prompt_ref({"name": "User", "topic": "Topic"}, hooks)
# Result now correctly shows "99" and "42" in output
  1. Confirmed backward compatibility:
  • Tasks without hooks still work
  • Scorer functions still receive metadata via input dict
  • Existing code without metadata continues to function

Impact

This fix ensures that SDK users can properly use metadata parameters in their prompt templates, achieving feature parity with the UI. This is especially important for users who rely on metadata for dynamic prompt generation or context injection.

🤖 Generated with Claude Code

When running experiments via SDK, metadata from dataset records was not
being passed through to prompt templates. This fix modifies init_function()
to accept and extract metadata from the hooks parameter when called as a
task, and properly pass it to the invoke() call.

The function now:
- Accepts exactly 2 parameters (input, hooks) to match framework requirements
- Extracts metadata from hooks.metadata when present
- Passes metadata to invoke() for proper template substitution
- Maintains backward compatibility for both task and scorer modes

This ensures mustache template variables like {{metadata.fieldName}} are
correctly rendered with dataset metadata values during SDK experiments.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
if len(args) > 0:
# Task.
return invoke(project_name=project_name, slug=slug, version=version, input=args[0])
def f(input, hooks=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure if input / hooks are the only possible args, but if other things were passed in e.g.f(foo=123) this will be backwards incompatible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants