Skip to content

Feature Request: Streaming Token Generation with Mid-Generation Tool Execution #113

@Mr-Ye-Cao

Description

@Mr-Ye-Cao

Summary

Request for streaming token generation that allows pausing generation mid-stream to execute tools and append results before continuing. This would enable proper agentic tool-use patterns where models expect inline tool results.

Problem Statement

Current Behavior

In the current Tinker architecture, model generation is atomic:

# tinker_cookbook/rl/rollouts.py
ac_with_logprobs = await policy(ob, stop_condition)  # Complete generation
step_result = await env.step(ac_with_logprobs.tokens)  # Process AFTER generation

The SamplingClient.sample_async() returns only the final complete token sequence, not intermediate tokens.

The Issue

Models trained with tool-use (e.g., GPT-OSS, function-calling models) expect a specific interaction pattern:

Model: <analysis>Let me check the file</analysis>
Model: <tool_call>{"command": "cat file.txt"}</tool_call>
System: [Tool result appended inline] file contents here...
Model: <analysis>I see the file contains...</analysis>
Model: <tool_call>{"command": "echo 'fixed' > file.txt"}</tool_call>
System: [Tool result appended inline]
Model: <final_answer>Done!</final_answer>

But with atomic generation, we get:

Model: <analysis>Let me check the file</analysis>
Model: <tool_call>{"command": "cat file.txt"}</tool_call>
Model: [HALLUCINATED] The file probably contains...  <-- Model guesses without seeing result
Model: <tool_call>{"command": "echo 'fixed' > file.txt"}</tool_call>
Model: [HALLUCINATED] Command executed successfully
Model: <final_answer>Done!</final_answer>

The model hallucinates tool results because it doesn't receive actual feedback inline.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions