OpenType is a macOS menu bar app for AI-powered voice input. It supports both fully local on-device inference and remote LLM APIs. Press a hotkey to start recording, release to transcribe, and the result is typed directly into whatever app you're using.
Three output modes are available:
- Verbatim — raw transcription, lowest latency
- Smart Format — transcription cleaned up by an LLM (contextual filler removal, grammar fixes, structured formatting)
- Voice Command — speak a command and get an AI-generated response based on screen context
| Feature | Description |
|---|---|
| Dual Speech Engines | Apple Speech (built-in, zero download) or WhisperKit (offline Whisper models) |
| Smart Text Processing | Local MLX Qwen2.5/Qwen3 or remote LLM — contextual cleanup, self-correction handling, structured list formatting |
| Remote LLM Support | OpenAI, Claude (Anthropic format), Gemini, OpenRouter, SiliconFlow, Doubao, Bailian, MiniMax (CN & Global) |
| Global Hotkey | Configurable key (Fn/Ctrl/Shift/Option) with long-press, double-tap, or single-tap activation |
| Screen Context OCR | Captures on-screen text via ScreenCaptureKit + Vision to help the LLM correct homophones |
| Voice Command Mode | Screen-aware voice assistant — summarize, reply, translate based on what's on screen |
| Input Memory | Recent input history injected as LLM context for better continuity |
| Edit Rules | Personal text replacement rules applied on every output |
| Language Style Presets | Concise / Formal / Casual / Custom prompt per language |
| Input History & Stats | Full history with raw vs. processed comparison, word count stats, configurable retention |
| Bilingual UI | Chinese and English interface, independent of recognition language |
| Sound Feedback | Audio cues on recording start and stop |
| Guided Onboarding | Step-by-step setup: permissions, model download, and first use |
- OS: macOS 26 (Tahoe) or later
- Chip: Apple Silicon (M1 / M2 / M3 / M4)
- Disk: ~400 MB minimum (Apple Speech + Qwen3-0.6B), up to ~4 GB with larger models
Grab the latest .dmg from Releases, open it, and drag OpenType.app to Applications.
"Cannot verify the developer" on first launch? The app is not notarized by Apple. Before first run, execute in Terminal:
xattr -cr /Applications/OpenType.appOr go to System Settings → Privacy & Security and click "Open Anyway".
# Build .app bundle + .dmg installer
bash scripts/build-app.sh
# Or for development
swift build
swift run OpenType
# Or open in Xcode
open Package.swift- Launch OpenType — it appears as a waveform icon in the menu bar
- The onboarding wizard guides you through permissions and model setup
- Grant Microphone and Accessibility permissions (required)
- Wait for the LLM model to download (~335 MB, one-time)
- Hold Fn to start dictating, release to stop and insert text
| Permission | Purpose | Required |
|---|---|---|
| Microphone | Audio capture | Yes |
| Accessibility | Global hotkey + text injection (simulated paste) | Yes |
| Speech Recognition | Apple on-device ASR engine | Only if using Apple Speech |
| Screen Recording | OCR for screen context and Voice Command mode | Optional |
| Network | Model downloads; remote LLM API calls | First run / remote mode |
OpenType supports both OpenAI-compatible and Anthropic API formats:
| Provider | API Format | Base URL |
|---|---|---|
| OpenAI | OpenAI | https://api.openai.com/v1 |
| Anthropic Claude | Anthropic | https://api.anthropic.com/v1 |
| Google Gemini | OpenAI | https://generativelanguage.googleapis.com/v1beta/openai |
| OpenRouter | OpenAI | https://openrouter.ai/api/v1 |
| SiliconFlow | OpenAI | https://api.siliconflow.cn/v1 |
| Volcengine Doubao | OpenAI | https://ark.cn-beijing.volces.com/api/v3 |
| Alibaba Bailian | OpenAI | https://dashscope.aliyuncs.com/compatible-mode/v1 |
| MiniMax (China) | OpenAI | https://api.minimax.chat/v1 |
| MiniMax (Global) | OpenAI | https://api.minimaxi.chat/v1 |
Sources/
├── App/ # Entry point, AppDelegate, AppState, VoicePipeline, AppIcon
├── Audio/ # Microphone capture (AVAudioEngine), sound playback
├── Config/ # AppSettings, ModelCatalog, RemoteModelConfig, Localization
├── Hotkey/ # Global hotkey via CGEvent tap
├── LLM/ # LLMEngine (MLX), RemoteLLMClient (OpenAI/Anthropic), PromptBuilder
├── Output/ # Text injection (Accessibility API + clipboard paste)
├── Processing/ # TextProcessor, InputHistory, MemoryStore, PersonalDictionary
├── Screen/ # Screen OCR (ScreenCaptureKit + Vision)
├── Speech/ # SpeechEngine protocol, WhisperKit engine, Apple Speech engine
├── UI/ # SwiftUI: MenuBar, Settings, Onboarding, Overlay, History, Models
└── Resources/ # Localization strings (en/zh-Hans), sounds, app icon
scripts/
├── build-app.sh # Build .app bundle and .dmg installer
├── generate-icon.swift # Generate AppIcon.icns from source PNG
└── create-signing-cert.sh # Generate self-signed code signing certificate
- WhisperKit — offline Whisper speech recognition
- mlx-swift-lm — local LLM inference on Apple Silicon (Qwen2.5 / Qwen3)
- SwiftUI + AppKit — native macOS UI
- ScreenCaptureKit + Vision — screen OCR
- AVAudioEngine — low-latency microphone capture
- Apple Speech Framework — on-device speech recognition
Made with care for Apple Silicon