feat(cli): add tts command for local text-to-speech via Kokoro-82M by jrusso1020 · Pull Request #201 · heygen-com/hyperframes

jrusso1020 · 2026-04-02T20:22:13Z

What

Adds hyperframes tts — a new CLI command for generating speech audio locally using Kokoro-82M (ONNX). No API key needed, runs entirely on CPU.

Why

Users creating compositions with narration/voiceovers currently need external TTS services. This brings TTS in-house, mirroring the local whisper-based transcribe command. Kokoro-82M was selected after evaluating 9 TTS models — it ranked #1 on TTS Arena with the best quality-to-size ratio (54 voices, 8 languages, ~5x realtime on CPU).

How

Follows the same architecture as the transcribe command:

packages/cli/src/commands/tts.ts — Command definition with --voice, --speed, --output, --list, --json flags
packages/cli/src/tts/manager.ts — Model + voice download/caching to ~/.cache/hyperframes/tts/
packages/cli/src/tts/synthesize.ts — Python subprocess invoking kokoro-onnx for synthesis
packages/cli/src/utils/download.ts — Extracted shared download utility (was duplicated in whisper/manager.ts) with atomic .tmp → rename to prevent partial download corruption
skills/hyperframes-tts/SKILL.md — Skill docs with voice selection guide, speed control, composition integration
CLAUDE.md — Updated with TTS section, voice table, and skill reference

Notable design decisions:

Requires users to install kokoro-onnx themselves (pip install kokoro-onnx soundfile) rather than silently running pip install — avoids polluting system Python
Model (~311 MB) and voices (~27 MB) download in parallel on first run
Synthesis script is cached in ~/.cache/hyperframes/tts/synth.py (not rewritten per invocation)
input positional arg is optional so --list works without a dummy argument

Test plan

Generated audio samples with all 4 tested voices on CPU (af_heart, af_nova, am_adam, bf_emma) — all produced valid WAV files at ~5x realtime
--list flag works without positional argument
--json output mode returns structured result
Lint (oxlint), format (oxfmt), and typecheck (tsc) all pass
Pre-commit hooks pass (lint + format + typecheck + commitlint)
Whisper transcribe command still works after downloadFile extraction
Documentation updated (CLAUDE.md, SKILL.md, package.json build:copy)

Adds `hyperframes tts` — generate speech audio locally using Kokoro-82M (ONNX), no API key needed. Mirrors the transcribe command architecture. - New command: `hyperframes tts "text" --voice af_heart --output speech.wav` - 54 voices across 8 languages, ~5x realtime on CPU - Auto-downloads model (~311 MB) + voices (~27 MB) to ~/.cache/hyperframes/tts/ - Requires Python 3.8+ with kokoro-onnx installed - Extracted shared `downloadFile` utility from whisper/manager.ts with atomic .tmp→rename to prevent partial download corruption - Added hyperframes-tts skill with voice selection guide - Updated CLAUDE.md with TTS docs, voice table, and skill reference Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Move trigger info from body to frontmatter description - Remove `trigger` field (not a valid frontmatter field) - Remove CLI flag docs Claude can derive from --help - Remove redundant voice tables (keep content-to-voice mapping) - Fix composition audio example to use actual <audio> element pattern - Keep non-obvious workflows: TTS+transcribe for captions, long scripts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Help users understand when to use cloud TTS (voice cloning, broader languages, higher quality) vs the built-in Kokoro model, and how external audio integrates into the same composition workflow. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Citty treats positional args as required by default unless explicitly set to required: false. Without this, `hyperframes tts --list` fails with "Missing required positional argument". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add examples section to `tts --help` matching the pattern from other commands (transcribe, render, etc.). Fix citty positional arg requiring explicit `required: false` for --list to work standalone. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Ensure new commands always get --help examples in help.ts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

# Conflicts: # CLAUDE.md

) * feat(cli): add `tts` command for local text-to-speech via Kokoro-82M Adds `hyperframes tts` — generate speech audio locally using Kokoro-82M (ONNX), no API key needed. Mirrors the transcribe command architecture. - New command: `hyperframes tts "text" --voice af_heart --output speech.wav` - 54 voices across 8 languages, ~5x realtime on CPU - Auto-downloads model (~311 MB) + voices (~27 MB) to ~/.cache/hyperframes/tts/ - Requires Python 3.8+ with kokoro-onnx installed - Extracted shared `downloadFile` utility from whisper/manager.ts with atomic .tmp→rename to prevent partial download corruption - Added hyperframes-tts skill with voice selection guide - Updated CLAUDE.md with TTS docs, voice table, and skill reference Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs(tts): improve skill per skill-creator guidelines - Move trigger info from body to frontmatter description - Remove `trigger` field (not a valid frontmatter field) - Remove CLI flag docs Claude can derive from --help - Remove redundant voice tables (keep content-to-voice mapping) - Fix composition audio example to use actual <audio> element pattern - Keep non-obvious workflows: TTS+transcribe for captions, long scripts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs(tts): add guidance for using external TTS sources Help users understand when to use cloud TTS (voice cloning, broader languages, higher quality) vs the built-in Kokoro model, and how external audio integrates into the same composition workflow. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs(tts): prioritize HeyGen API as recommended cloud TTS Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs(tts): remove external TTS section for now Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(tts): set required: false on input arg so --list works standalone Citty treats positional args as required by default unless explicitly set to required: false. Without this, `hyperframes tts --list` fails with "Missing required positional argument". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(tts): add --help examples and fix required:false for --list Add examples section to `tts --help` matching the pattern from other commands (transcribe, render, etc.). Fix citty positional arg requiring explicit `required: false` for --list to work standalone. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add CLI command checklist to CLAUDE.md Ensure new commands always get --help examples in help.ts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…checklist The tts command was implemented (PR #201) but never added to the root-level help display or documentation. This adds it to: - help.ts GROUPS (AI & Integrations) so it appears in `hyperframes --help` - docs/packages/cli.mdx with usage examples and flag reference - CLAUDE.md "Adding CLI Commands" checklist: new steps 4-5 require adding commands to help.ts groups and docs, preventing future omissions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…checklist (#240) The tts command was implemented (PR #201) but never added to the root-level help display or documentation. This adds it to: - help.ts GROUPS (AI & Integrations) so it appears in `hyperframes --help` - docs/packages/cli.mdx with usage examples and flag reference - CLAUDE.md "Adding CLI Commands" checklist: new steps 4-5 require adding commands to help.ts groups and docs, preventing future omissions Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jrusso1020 and others added 8 commits April 2, 2026 20:21

docs(tts): prioritize HeyGen API as recommended cloud TTS

544a895

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs(tts): remove external TTS section for now

e550b1a

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: add CLI command checklist to CLAUDE.md

7b02cc8

Ensure new commands always get --help examples in help.ts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

vanceingalls approved these changes Apr 2, 2026

View reviewed changes

chore: merge main into feat/tts-command

52e8821

# Conflicts: # CLAUDE.md

jrusso1020 merged commit 7389c0c into main Apr 2, 2026
14 checks passed

jrusso1020 deleted the feat/tts-command branch April 2, 2026 21:09

jrusso1020 mentioned this pull request Apr 10, 2026

docs(cli): add tts command to --help and CLI docs #240

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cli): add tts command for local text-to-speech via Kokoro-82M#201

feat(cli): add tts command for local text-to-speech via Kokoro-82M#201
jrusso1020 merged 9 commits intomainfrom
feat/tts-command

jrusso1020 commented Apr 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jrusso1020 commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

How

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jrusso1020 commented Apr 2, 2026 •

edited

Loading