Add local voice input (Whisper) with push-to-talk and visual feedback by moritzbappert · Pull Request #178 · pluginslab/wp-agentic-admin

moritzbappert · 2026-03-22T10:01:14Z

Summary

Adds on-device speech-to-text via Whisper Tiny (Xenova/whisper-tiny, ~40 MB ONNX) running entirely in a Web Worker — no audio ever leaves the browser
Space bar hold-to-record (push-to-talk) with a 200 ms threshold to distinguish a quick tap (inserts space) from a deliberate hold (starts recording); OS key-repeat events suppressed during recording to prevent ghost spaces
Pulsing red border glow on the input wrapper during recording, transcribing wave overlay centred over the textarea while Whisper runs
Fixes ReAct agent crash when the LLM returns a tool_call action without a tool field (toolId.includes TypeError)

Changes

src/extensions/services/whisper-worker.js — new Web Worker: Whisper pipeline (WASM/q8), ISO→Whisper language map, warmup message, debug logging
src/extensions/components/VoiceButton.jsx — new component: forwardRef + useImperativeHandle exposing start/stop, 30 s countdown with auto-stop, warning state ≤10 s, transcribing dots
src/extensions/components/ChatInput.jsx — VoiceButton integration, voiceState tracking, Space push-to-talk with 200 ms timer, Enter-to-send, transcribing overlay, recording glow modifier
src/extensions/components/ChatContainer.jsx — placeholder updated to hint at Space shortcut; incorporates upstream Thinking… state from fix/issue-147
src/extensions/styles/main.scss — voice button styles, recording/transcribing animations, input wrapper --recording glow keyframe
src/extensions/services/react-agent.js — guard against undefined toolId at call site and in executeTool
webpack.config.js — whisper-worker entry as self-contained bundle (no code splitting)
package.json / package-lock.json — adds @huggingface/transformers

Testing

Unit tests pass (npm test)
Ability tests pass (npm run test:abilities -- --file tests/abilities/core-abilities.test.js)
JS lint clean (npm run lint:js)
PHP lint clean (composer lint)
Manually tested in browser (if UI changes)

Notes

First use downloads the ~40 MB Whisper ONNX model from HuggingFace Hub and caches it in Cache Storage — subsequent uses are instant
Model is pre-warmed on component mount (warmup message to worker) so first recording is fast
iOS Safari falls back to audio/mp4 (no WebM support) and WASM backend (no WebGPU)
Debug console.log statements remain in whisper-worker.js intentionally for now to aid diagnosing transcription issues in the field

Adds a self-contained Web Worker that runs Xenova/whisper-tiny (~40 MB ONNX) entirely in the browser via @huggingface/transformers. No audio ever leaves the device, keeping the plugin privacy-first. - WASM backend (not WebGPU) avoids q8 precision issues on some drivers - Language forced via ISO→name mapping + explicit task: 'transcribe' to prevent multilingual hallucinations and accidental translation - Warmup message pre-loads the model on mount so first use is instant - Webpack entry configured as self-contained bundle (no code splitting) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Microphone button for speech-to-text input with three states: idle → recording → transcribing. - Stop icon + live countdown badge during recording (flex-column, stays within button bounds — no overflow clipping issues) - Hard auto-stop at 30 s matching Whisper's context window - Warning state (≤ 10 s remaining): deeper red + faster pulse - Subtle recording pulse animation to show the mic is active - Pre-warms the Whisper worker on mount for instant first use - Falls back to null on browsers without MediaRecorder/getUserMedia Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Hold Space on an empty textarea to record; release to stop and transcribe. Enter sends the message after transcription (Shift+Enter for newline). - VoiceButton converted to forwardRef with useImperativeHandle exposing start()/stop() so ChatInput can drive recording from keyboard events - Imperative-handle refs populated after the early-return check to keep hooks unconditional while startRecording/stopRecording stay post-guard - e.repeat guard prevents holding Space from firing multiple starts - isDisabled guard prevents Space from triggering during model load - Placeholder updated to hint at both shortcuts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

LLM occasionally returns {"action":"tool_call"} without a "tool" field, causing a TypeError crash at toolId.includes('/'). - Skip the iteration gracefully when toolName is falsy, pushing a synthetic error observation so the loop can summarize and continue. - Add early-return guard in executeTool() as a belt-and-suspenders defence against the same case at the execution layer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ChatContainer was overriding ChatInput's default placeholder with a hardcoded string that omitted the hint. Updated to match the agreed wording: '… (hold Space to speak)'. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

While push-to-talk is active the input wrapper gains a --recording modifier that: - Shifts the border to a muted red (rgba of #d63638 at 40% opacity) - Applies a very faint warm tint to the background (2.5% opacity) - Pulses the box-shadow between 30% and 8% opacity at 1.8s per cycle The blue :focus-within highlight is suppressed during recording so the red glow is not competed with. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The transcribing overlay now provides the visual feedback while Whisper runs, so setting placeholder text in the textarea is redundant. - Remove handlePartialTranscript and the onPartialTranscript prop from ChatInput — the overlay is the sole indicator during transcription. - Remove onPartialTranscriptRef and all three call sites from VoiceButton (auto-stop timeout, stopRecording, and the error catch path). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Whitespace-only changes produced by --fix during earlier lint runs. No functional changes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The --ours conflict resolutions during rebase accidentally kept an intermediate (pre-voice) version of ChatInput.jsx. This commit restores the correct final state: VoiceButton integration, voiceState tracking, Space push-to-talk with 200ms threshold, transcribing overlay, and recording glow modifier. Also incorporates the upstream fix/issue-147 change: simplified the textarea placeholder to use the prop directly (the isDisabled ternary moved to ChatContainer). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

pluginslab

Privacy-first voice input — Whisper Tiny in a Web Worker, push-to-talk with Space bar, iOS Safari fallback, 30s countdown, visual feedback. Plus ReAct crash fix for missing tool name. No conflicts with dev. LGTM!

moritzbappert and others added 10 commits March 22, 2026 10:54

fix: apply Space-to-speak placeholder in ChatContainer

975a587

ChatContainer was overriding ChatInput's default placeholder with a hardcoded string that omitted the hint. Updated to match the agreed wording: '… (hold Space to speak)'. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: apply Prettier formatting to incidentally touched files

81b17cc

Whitespace-only changes produced by --fix during earlier lint runs. No functional changes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

style: fix Prettier formatting in ChatContainer placeholder ternary

d37584b

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

moritzbappert added the enhancement New feature or request label Mar 22, 2026

moritzbappert requested a review from pluginslab March 22, 2026 10:01

pluginslab merged commit 4e72071 into dev Mar 22, 2026
1 of 4 checks passed

pluginslab approved these changes Mar 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add local voice input (Whisper) with push-to-talk and visual feedback#178

Add local voice input (Whisper) with push-to-talk and visual feedback#178
pluginslab merged 10 commits intodevfrom
feature/voice-mode

moritzbappert commented Mar 22, 2026

Uh oh!

Uh oh!

pluginslab left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

moritzbappert commented Mar 22, 2026

Summary

Changes

Testing

Notes

Uh oh!

Uh oh!

pluginslab left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants