Add local voice input (Whisper) with push-to-talk and visual feedback#178
Merged
pluginslab merged 10 commits intodevfrom Mar 22, 2026
Merged
Add local voice input (Whisper) with push-to-talk and visual feedback#178pluginslab merged 10 commits intodevfrom
pluginslab merged 10 commits intodevfrom
Conversation
Adds a self-contained Web Worker that runs Xenova/whisper-tiny (~40 MB ONNX) entirely in the browser via @huggingface/transformers. No audio ever leaves the device, keeping the plugin privacy-first. - WASM backend (not WebGPU) avoids q8 precision issues on some drivers - Language forced via ISO→name mapping + explicit task: 'transcribe' to prevent multilingual hallucinations and accidental translation - Warmup message pre-loads the model on mount so first use is instant - Webpack entry configured as self-contained bundle (no code splitting) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Microphone button for speech-to-text input with three states: idle → recording → transcribing. - Stop icon + live countdown badge during recording (flex-column, stays within button bounds — no overflow clipping issues) - Hard auto-stop at 30 s matching Whisper's context window - Warning state (≤ 10 s remaining): deeper red + faster pulse - Subtle recording pulse animation to show the mic is active - Pre-warms the Whisper worker on mount for instant first use - Falls back to null on browsers without MediaRecorder/getUserMedia Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Hold Space on an empty textarea to record; release to stop and transcribe. Enter sends the message after transcription (Shift+Enter for newline). - VoiceButton converted to forwardRef with useImperativeHandle exposing start()/stop() so ChatInput can drive recording from keyboard events - Imperative-handle refs populated after the early-return check to keep hooks unconditional while startRecording/stopRecording stay post-guard - e.repeat guard prevents holding Space from firing multiple starts - isDisabled guard prevents Space from triggering during model load - Placeholder updated to hint at both shortcuts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
LLM occasionally returns {"action":"tool_call"} without a "tool" field,
causing a TypeError crash at toolId.includes('/').
- Skip the iteration gracefully when toolName is falsy, pushing a
synthetic error observation so the loop can summarize and continue.
- Add early-return guard in executeTool() as a belt-and-suspenders
defence against the same case at the execution layer.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ChatContainer was overriding ChatInput's default placeholder with a hardcoded string that omitted the hint. Updated to match the agreed wording: '… (hold Space to speak)'. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
While push-to-talk is active the input wrapper gains a --recording modifier that: - Shifts the border to a muted red (rgba of #d63638 at 40% opacity) - Applies a very faint warm tint to the background (2.5% opacity) - Pulses the box-shadow between 30% and 8% opacity at 1.8s per cycle The blue :focus-within highlight is suppressed during recording so the red glow is not competed with. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The transcribing overlay now provides the visual feedback while Whisper runs, so setting placeholder text in the textarea is redundant. - Remove handlePartialTranscript and the onPartialTranscript prop from ChatInput — the overlay is the sole indicator during transcription. - Remove onPartialTranscriptRef and all three call sites from VoiceButton (auto-stop timeout, stopRecording, and the error catch path). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Whitespace-only changes produced by --fix during earlier lint runs. No functional changes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The --ours conflict resolutions during rebase accidentally kept an intermediate (pre-voice) version of ChatInput.jsx. This commit restores the correct final state: VoiceButton integration, voiceState tracking, Space push-to-talk with 200ms threshold, transcribing overlay, and recording glow modifier. Also incorporates the upstream fix/issue-147 change: simplified the textarea placeholder to use the prop directly (the isDisabled ternary moved to ChatContainer). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
pluginslab
approved these changes
Mar 22, 2026
Owner
pluginslab
left a comment
There was a problem hiding this comment.
Privacy-first voice input — Whisper Tiny in a Web Worker, push-to-talk with Space bar, iOS Safari fallback, 30s countdown, visual feedback. Plus ReAct crash fix for missing tool name. No conflicts with dev. LGTM!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
tool_callaction without atoolfield (toolId.includesTypeError)Changes
src/extensions/services/whisper-worker.js— new Web Worker: Whisper pipeline (WASM/q8), ISO→Whisper language map, warmup message, debug loggingsrc/extensions/components/VoiceButton.jsx— new component: forwardRef + useImperativeHandle exposing start/stop, 30 s countdown with auto-stop, warning state ≤10 s, transcribing dotssrc/extensions/components/ChatInput.jsx— VoiceButton integration, voiceState tracking, Space push-to-talk with 200 ms timer, Enter-to-send, transcribing overlay, recording glow modifiersrc/extensions/components/ChatContainer.jsx— placeholder updated to hint at Space shortcut; incorporates upstream Thinking… state from fix/issue-147src/extensions/styles/main.scss— voice button styles, recording/transcribing animations, input wrapper--recordingglow keyframesrc/extensions/services/react-agent.js— guard against undefined toolId at call site and in executeToolwebpack.config.js— whisper-worker entry as self-contained bundle (no code splitting)package.json/package-lock.json— adds@huggingface/transformersTesting
npm test)npm run test:abilities -- --file tests/abilities/core-abilities.test.js)npm run lint:js)composer lint)Notes
audio/mp4(no WebM support) and WASM backend (no WebGPU)console.logstatements remain in whisper-worker.js intentionally for now to aid diagnosing transcription issues in the field