feat(cli): add tts command for local text-to-speech via Kokoro-82M#201
Merged
jrusso1020 merged 9 commits intomainfrom Apr 2, 2026
Merged
feat(cli): add tts command for local text-to-speech via Kokoro-82M#201jrusso1020 merged 9 commits intomainfrom
jrusso1020 merged 9 commits intomainfrom
Conversation
Adds `hyperframes tts` — generate speech audio locally using Kokoro-82M (ONNX), no API key needed. Mirrors the transcribe command architecture. - New command: `hyperframes tts "text" --voice af_heart --output speech.wav` - 54 voices across 8 languages, ~5x realtime on CPU - Auto-downloads model (~311 MB) + voices (~27 MB) to ~/.cache/hyperframes/tts/ - Requires Python 3.8+ with kokoro-onnx installed - Extracted shared `downloadFile` utility from whisper/manager.ts with atomic .tmp→rename to prevent partial download corruption - Added hyperframes-tts skill with voice selection guide - Updated CLAUDE.md with TTS docs, voice table, and skill reference Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move trigger info from body to frontmatter description - Remove `trigger` field (not a valid frontmatter field) - Remove CLI flag docs Claude can derive from --help - Remove redundant voice tables (keep content-to-voice mapping) - Fix composition audio example to use actual <audio> element pattern - Keep non-obvious workflows: TTS+transcribe for captions, long scripts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Help users understand when to use cloud TTS (voice cloning, broader languages, higher quality) vs the built-in Kokoro model, and how external audio integrates into the same composition workflow. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Citty treats positional args as required by default unless explicitly set to required: false. Without this, `hyperframes tts --list` fails with "Missing required positional argument". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add examples section to `tts --help` matching the pattern from other commands (transcribe, render, etc.). Fix citty positional arg requiring explicit `required: false` for --list to work standalone. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ensure new commands always get --help examples in help.ts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
vanceingalls
approved these changes
Apr 2, 2026
# Conflicts: # CLAUDE.md
miguel-heygen
pushed a commit
that referenced
this pull request
Apr 3, 2026
) * feat(cli): add `tts` command for local text-to-speech via Kokoro-82M Adds `hyperframes tts` — generate speech audio locally using Kokoro-82M (ONNX), no API key needed. Mirrors the transcribe command architecture. - New command: `hyperframes tts "text" --voice af_heart --output speech.wav` - 54 voices across 8 languages, ~5x realtime on CPU - Auto-downloads model (~311 MB) + voices (~27 MB) to ~/.cache/hyperframes/tts/ - Requires Python 3.8+ with kokoro-onnx installed - Extracted shared `downloadFile` utility from whisper/manager.ts with atomic .tmp→rename to prevent partial download corruption - Added hyperframes-tts skill with voice selection guide - Updated CLAUDE.md with TTS docs, voice table, and skill reference Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs(tts): improve skill per skill-creator guidelines - Move trigger info from body to frontmatter description - Remove `trigger` field (not a valid frontmatter field) - Remove CLI flag docs Claude can derive from --help - Remove redundant voice tables (keep content-to-voice mapping) - Fix composition audio example to use actual <audio> element pattern - Keep non-obvious workflows: TTS+transcribe for captions, long scripts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs(tts): add guidance for using external TTS sources Help users understand when to use cloud TTS (voice cloning, broader languages, higher quality) vs the built-in Kokoro model, and how external audio integrates into the same composition workflow. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs(tts): prioritize HeyGen API as recommended cloud TTS Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs(tts): remove external TTS section for now Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(tts): set required: false on input arg so --list works standalone Citty treats positional args as required by default unless explicitly set to required: false. Without this, `hyperframes tts --list` fails with "Missing required positional argument". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(tts): add --help examples and fix required:false for --list Add examples section to `tts --help` matching the pattern from other commands (transcribe, render, etc.). Fix citty positional arg requiring explicit `required: false` for --list to work standalone. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add CLI command checklist to CLAUDE.md Ensure new commands always get --help examples in help.ts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jrusso1020
added a commit
that referenced
this pull request
Apr 10, 2026
…checklist The tts command was implemented (PR #201) but never added to the root-level help display or documentation. This adds it to: - help.ts GROUPS (AI & Integrations) so it appears in `hyperframes --help` - docs/packages/cli.mdx with usage examples and flag reference - CLAUDE.md "Adding CLI Commands" checklist: new steps 4-5 require adding commands to help.ts groups and docs, preventing future omissions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 tasks
jrusso1020
added a commit
that referenced
this pull request
Apr 10, 2026
…checklist (#240) The tts command was implemented (PR #201) but never added to the root-level help display or documentation. This adds it to: - help.ts GROUPS (AI & Integrations) so it appears in `hyperframes --help` - docs/packages/cli.mdx with usage examples and flag reference - CLAUDE.md "Adding CLI Commands" checklist: new steps 4-5 require adding commands to help.ts groups and docs, preventing future omissions Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds
hyperframes tts— a new CLI command for generating speech audio locally using Kokoro-82M (ONNX). No API key needed, runs entirely on CPU.Why
Users creating compositions with narration/voiceovers currently need external TTS services. This brings TTS in-house, mirroring the local whisper-based
transcribecommand. Kokoro-82M was selected after evaluating 9 TTS models — it ranked #1 on TTS Arena with the best quality-to-size ratio (54 voices, 8 languages, ~5x realtime on CPU).How
Follows the same architecture as the
transcribecommand:packages/cli/src/commands/tts.ts— Command definition with--voice,--speed,--output,--list,--jsonflagspackages/cli/src/tts/manager.ts— Model + voice download/caching to~/.cache/hyperframes/tts/packages/cli/src/tts/synthesize.ts— Python subprocess invokingkokoro-onnxfor synthesispackages/cli/src/utils/download.ts— Extracted shared download utility (was duplicated in whisper/manager.ts) with atomic.tmp→ rename to prevent partial download corruptionskills/hyperframes-tts/SKILL.md— Skill docs with voice selection guide, speed control, composition integrationCLAUDE.md— Updated with TTS section, voice table, and skill referenceNotable design decisions:
kokoro-onnxthemselves (pip install kokoro-onnx soundfile) rather than silently runningpip install— avoids polluting system Python~/.cache/hyperframes/tts/synth.py(not rewritten per invocation)inputpositional arg is optional so--listworks without a dummy argumentTest plan
--listflag works without positional argument--jsonoutput mode returns structured resulttranscribecommand still works afterdownloadFileextraction