Fixes for Issue #112 concerning errors with tool calling with jinja templates and MLX#140
Open
mikedoise wants to merge 4 commits intomattt:mainfrom
Open
Fixes for Issue #112 concerning errors with tool calling with jinja templates and MLX#140mikedoise wants to merge 4 commits intomattt:mainfrom
mikedoise wants to merge 4 commits intomattt:mainfrom
Conversation
Persist KV caches across respond()/streamResponse() calls within the same LanguageModelSession. On subsequent turns only the new tokens are prefilled instead of re-encoding the entire conversation history, dramatically reducing time to first token. - Add maxKVSize, kvBits, kvGroupSize to GenerationOptions - Add SessionCacheEntry store with NSMapTable weak keys - Implement incremental prefill in streamResponse() and respond() - Enhance prewarm() to prefill system prompt into KV cache Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add GPUMemoryConfiguration struct with .automatic (RAM-scaled) and .unconstrained presets for controlling Metal buffer pool limits - Add GPUMemoryManager singleton with reference-counted active/idle toggling — cache stays high during concurrent generations, drops to idle limit only when all sessions complete - Wrap respond(), streamResponse(), and prewarm() with markActive/markIdle - Call evict() on removeFromCache/removeAllFromCache to reclaim GPU buffers - Upgrade mlx-swift from 0.29.1 to 0.30.6 (fast SDPA, cache race fix, Memory API, wired memory, iPhone 16 Pro NAX fix) - Upgrade mlx-swift-lm from 2.29.3 to 2.30.6 (Gemma3n per-layer intermediate_size, model loading perf, chat rehydration, tool calling) - Migrate deprecated GPU.set(cacheLimit:)/GPU.clearCache() to Memory.* Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
# Conflicts: # Sources/AnyLanguageModel/Models/MLXLanguageModel.swift
Gemma 3's Jinja chat template has no tool role support, causing tool result messages to crash the template engine during chat history replay. This fixes the issue by folding tool outputs into the preceding assistant message instead of using a separate .tool() role. Changes: - Fold tool results into assistant messages with [Tool result]: prefix to maintain strict user/assistant alternation required by Gemma 3 - Add max tool iteration guard (5) to prevent infinite tool-call loops - Fix convertToSendableJSONValue to return NSNull() instead of JSONValue.null so Jinja's Value(any:) can handle it - Check Bool before NSNumber to prevent booleans becoming integers - Record assistant text before tool calls in transcript for accurate chat replay and KV cache consistency - Move final text accumulation to after tool loop exit so only the final response is returned Fixes mattt#112 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hello, This PR is to hopefully address the issue with jinja templates and tool calls. There may be more here than what is needed, but this did get tool calling working in my app. let me know if you ahve any questions, or if we can improve on the quality of the code.