Skip to content

feat(cli): add tts command for local text-to-speech via Kokoro-82M#201

Merged
jrusso1020 merged 9 commits intomainfrom
feat/tts-command
Apr 2, 2026
Merged

feat(cli): add tts command for local text-to-speech via Kokoro-82M#201
jrusso1020 merged 9 commits intomainfrom
feat/tts-command

Conversation

@jrusso1020
Copy link
Copy Markdown
Collaborator

@jrusso1020 jrusso1020 commented Apr 2, 2026

What

Adds hyperframes tts — a new CLI command for generating speech audio locally using Kokoro-82M (ONNX). No API key needed, runs entirely on CPU.

Why

Users creating compositions with narration/voiceovers currently need external TTS services. This brings TTS in-house, mirroring the local whisper-based transcribe command. Kokoro-82M was selected after evaluating 9 TTS models — it ranked #1 on TTS Arena with the best quality-to-size ratio (54 voices, 8 languages, ~5x realtime on CPU).

How

Follows the same architecture as the transcribe command:

  • packages/cli/src/commands/tts.ts — Command definition with --voice, --speed, --output, --list, --json flags
  • packages/cli/src/tts/manager.ts — Model + voice download/caching to ~/.cache/hyperframes/tts/
  • packages/cli/src/tts/synthesize.ts — Python subprocess invoking kokoro-onnx for synthesis
  • packages/cli/src/utils/download.ts — Extracted shared download utility (was duplicated in whisper/manager.ts) with atomic .tmp → rename to prevent partial download corruption
  • skills/hyperframes-tts/SKILL.md — Skill docs with voice selection guide, speed control, composition integration
  • CLAUDE.md — Updated with TTS section, voice table, and skill reference

Notable design decisions:

  • Requires users to install kokoro-onnx themselves (pip install kokoro-onnx soundfile) rather than silently running pip install — avoids polluting system Python
  • Model (~311 MB) and voices (~27 MB) download in parallel on first run
  • Synthesis script is cached in ~/.cache/hyperframes/tts/synth.py (not rewritten per invocation)
  • input positional arg is optional so --list works without a dummy argument

Test plan

  • Generated audio samples with all 4 tested voices on CPU (af_heart, af_nova, am_adam, bf_emma) — all produced valid WAV files at ~5x realtime
  • --list flag works without positional argument
  • --json output mode returns structured result
  • Lint (oxlint), format (oxfmt), and typecheck (tsc) all pass
  • Pre-commit hooks pass (lint + format + typecheck + commitlint)
  • Whisper transcribe command still works after downloadFile extraction
  • Documentation updated (CLAUDE.md, SKILL.md, package.json build:copy)

jrusso1020 and others added 8 commits April 2, 2026 20:21
Adds `hyperframes tts` — generate speech audio locally using Kokoro-82M
(ONNX), no API key needed. Mirrors the transcribe command architecture.

- New command: `hyperframes tts "text" --voice af_heart --output speech.wav`
- 54 voices across 8 languages, ~5x realtime on CPU
- Auto-downloads model (~311 MB) + voices (~27 MB) to ~/.cache/hyperframes/tts/
- Requires Python 3.8+ with kokoro-onnx installed
- Extracted shared `downloadFile` utility from whisper/manager.ts with
  atomic .tmp→rename to prevent partial download corruption
- Added hyperframes-tts skill with voice selection guide
- Updated CLAUDE.md with TTS docs, voice table, and skill reference

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move trigger info from body to frontmatter description
- Remove `trigger` field (not a valid frontmatter field)
- Remove CLI flag docs Claude can derive from --help
- Remove redundant voice tables (keep content-to-voice mapping)
- Fix composition audio example to use actual <audio> element pattern
- Keep non-obvious workflows: TTS+transcribe for captions, long scripts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Help users understand when to use cloud TTS (voice cloning, broader
languages, higher quality) vs the built-in Kokoro model, and how
external audio integrates into the same composition workflow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Citty treats positional args as required by default unless explicitly
set to required: false. Without this, `hyperframes tts --list` fails
with "Missing required positional argument".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add examples section to `tts --help` matching the pattern from other
commands (transcribe, render, etc.). Fix citty positional arg requiring
explicit `required: false` for --list to work standalone.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ensure new commands always get --help examples in help.ts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jrusso1020 jrusso1020 merged commit 7389c0c into main Apr 2, 2026
14 checks passed
@jrusso1020 jrusso1020 deleted the feat/tts-command branch April 2, 2026 21:09
miguel-heygen pushed a commit that referenced this pull request Apr 3, 2026
)

* feat(cli): add `tts` command for local text-to-speech via Kokoro-82M

Adds `hyperframes tts` — generate speech audio locally using Kokoro-82M
(ONNX), no API key needed. Mirrors the transcribe command architecture.

- New command: `hyperframes tts "text" --voice af_heart --output speech.wav`
- 54 voices across 8 languages, ~5x realtime on CPU
- Auto-downloads model (~311 MB) + voices (~27 MB) to ~/.cache/hyperframes/tts/
- Requires Python 3.8+ with kokoro-onnx installed
- Extracted shared `downloadFile` utility from whisper/manager.ts with
  atomic .tmp→rename to prevent partial download corruption
- Added hyperframes-tts skill with voice selection guide
- Updated CLAUDE.md with TTS docs, voice table, and skill reference

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs(tts): improve skill per skill-creator guidelines

- Move trigger info from body to frontmatter description
- Remove `trigger` field (not a valid frontmatter field)
- Remove CLI flag docs Claude can derive from --help
- Remove redundant voice tables (keep content-to-voice mapping)
- Fix composition audio example to use actual <audio> element pattern
- Keep non-obvious workflows: TTS+transcribe for captions, long scripts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs(tts): add guidance for using external TTS sources

Help users understand when to use cloud TTS (voice cloning, broader
languages, higher quality) vs the built-in Kokoro model, and how
external audio integrates into the same composition workflow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs(tts): prioritize HeyGen API as recommended cloud TTS

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs(tts): remove external TTS section for now

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): set required: false on input arg so --list works standalone

Citty treats positional args as required by default unless explicitly
set to required: false. Without this, `hyperframes tts --list` fails
with "Missing required positional argument".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): add --help examples and fix required:false for --list

Add examples section to `tts --help` matching the pattern from other
commands (transcribe, render, etc.). Fix citty positional arg requiring
explicit `required: false` for --list to work standalone.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add CLI command checklist to CLAUDE.md

Ensure new commands always get --help examples in help.ts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jrusso1020 added a commit that referenced this pull request Apr 10, 2026
…checklist

The tts command was implemented (PR #201) but never added to the root-level
help display or documentation. This adds it to:

- help.ts GROUPS (AI & Integrations) so it appears in `hyperframes --help`
- docs/packages/cli.mdx with usage examples and flag reference
- CLAUDE.md "Adding CLI Commands" checklist: new steps 4-5 require adding
  commands to help.ts groups and docs, preventing future omissions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jrusso1020 added a commit that referenced this pull request Apr 10, 2026
…checklist (#240)

The tts command was implemented (PR #201) but never added to the root-level
help display or documentation. This adds it to:

- help.ts GROUPS (AI & Integrations) so it appears in `hyperframes --help`
- docs/packages/cli.mdx with usage examples and flag reference
- CLAUDE.md "Adding CLI Commands" checklist: new steps 4-5 require adding
  commands to help.ts groups and docs, preventing future omissions

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants