diff --git a/CLAUDE.md b/CLAUDE.md index 96dd0760..27d20b6b 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -72,7 +72,9 @@ When adding a new CLI command: 1. Define the command in `packages/cli/src/commands/.ts` using `defineCommand` from citty 2. **Export `examples`** in the same file — `export const examples: Example[] = [...]` (import `Example` from `./_examples.js`). These are displayed by `--help`. 3. Register it in `packages/cli/src/cli.ts` under `subCommands` (lazy-loaded) -4. Validate by running `npx tsx packages/cli/src/cli.ts --help` and verifying the examples section appears +4. **Add to help groups** in `packages/cli/src/help.ts` — add the command name and description to the appropriate `GROUPS` entry. Without this, the command won't appear in `hyperframes --help` even though it works. +5. **Document it** in `docs/packages/cli.mdx` — add a section with usage examples and flags. +6. Validate by running `npx tsx packages/cli/src/cli.ts --help` (command appears in the list) and `npx tsx packages/cli/src/cli.ts --help` (examples appear). ## Key Concepts diff --git a/docs/packages/cli.mdx b/docs/packages/cli.mdx index 38bf3603..2c5c343d 100644 --- a/docs/packages/cli.mdx +++ b/docs/packages/cli.mdx @@ -226,6 +226,42 @@ This is suppressed in CI environments, non-TTY shells, and when `HYPERFRAMES_NO_ For music or noisy audio, use `--model medium.en` for better accuracy. For the best results with production content, transcribe via the OpenAI or Groq Whisper API and import the JSON. + + ### `tts` + + Generate speech audio from text using a local AI model (Kokoro-82M). No API key required — runs entirely on-device. + + ```bash + # Generate speech from text + npx hyperframes tts "Welcome to HyperFrames" + + # Choose a voice + npx hyperframes tts "Hello world" --voice am_adam + + # Save to a specific file + npx hyperframes tts "Intro" --voice bf_emma --output narration.wav + + # Adjust speech speed + npx hyperframes tts "Slow and clear" --speed 0.8 + + # Read text from a file + npx hyperframes tts script.txt + + # List available voices + npx hyperframes tts --list + ``` + + | Flag | Description | + |------|-------------| + | `--output, -o` | Output file path (default: `speech.wav` in current directory) | + | `--voice, -v` | Voice ID (run `--list` to see options) | + | `--speed, -s` | Speech speed multiplier (default: 1.0) | + | `--list` | List available voices and exit | + | `--json` | Output result as JSON | + + + Combine `tts` with `transcribe` to generate narration and word-level timestamps for captions in a single workflow: generate the audio with `tts`, then transcribe the output with `transcribe` to get word-level timing. + ### `preview` diff --git a/packages/cli/src/help.ts b/packages/cli/src/help.ts index f13d8533..760abdb7 100644 --- a/packages/cli/src/help.ts +++ b/packages/cli/src/help.ts @@ -53,6 +53,7 @@ const GROUPS: Group[] = [ "transcribe", "Transcribe audio/video to word-level timestamps, or import an existing transcript", ], + ["tts", "Generate speech audio from text using a local AI model (Kokoro-82M)"], ], }, {