OpenType

Local AI-powered voice input for macOS menu bar

Overview

OpenType is a macOS menu bar app for AI-powered voice input. It supports both fully local on-device inference and remote LLM APIs. Press a hotkey to start recording, release to transcribe, and the result is typed directly into whatever app you're using.

Three output modes are available:

Verbatim — raw transcription, lowest latency
Smart Format — transcription cleaned up by an LLM (contextual filler removal, grammar fixes, structured formatting)
Voice Command — speak a command and get an AI-generated response based on screen context

Features

Feature	Description
Dual Speech Engines	Apple Speech (built-in, zero download) or WhisperKit (offline Whisper models)
Smart Text Processing	Local MLX Qwen2.5/Qwen3 or remote LLM — contextual cleanup, self-correction handling, structured list formatting
Remote LLM Support	OpenAI, Claude (Anthropic format), Gemini, OpenRouter, SiliconFlow, Doubao, Bailian, MiniMax (CN & Global)
Global Hotkey	Configurable key (Fn/Ctrl/Shift/Option) with long-press, double-tap, or single-tap activation
Screen Context OCR	Captures on-screen text via ScreenCaptureKit + Vision to help the LLM correct homophones
Voice Command Mode	Screen-aware voice assistant — summarize, reply, translate based on what's on screen
Input Memory	Recent input history injected as LLM context for better continuity
Edit Rules	Personal text replacement rules applied on every output
Language Style Presets	Concise / Formal / Casual / Custom prompt per language
Input History & Stats	Full history with raw vs. processed comparison, word count stats, configurable retention
Bilingual UI	Chinese and English interface, independent of recognition language
Sound Feedback	Audio cues on recording start and stop
Guided Onboarding	Step-by-step setup: permissions, model download, and first use

System Requirements

OS: macOS 26 (Tahoe) or later
Chip: Apple Silicon (M1 / M2 / M3 / M4)
Disk: ~400 MB minimum (Apple Speech + Qwen3-0.6B), up to ~4 GB with larger models

Installation

Download

Grab the latest .dmg from Releases, open it, and drag OpenType.app to Applications.

"Cannot verify the developer" on first launch? The app is not notarized by Apple. Before first run, execute in Terminal:
xattr -cr /Applications/OpenType.app
Or go to System Settings → Privacy & Security and click "Open Anyway".

Build from Source

# Build .app bundle + .dmg installer
bash scripts/build-app.sh

# Or for development
swift build
swift run OpenType

# Or open in Xcode
open Package.swift

First Run

Launch OpenType — it appears as a waveform icon in the menu bar
The onboarding wizard guides you through permissions and model setup
Grant Microphone and Accessibility permissions (required)
Wait for the LLM model to download (~335 MB, one-time)
Hold Fn to start dictating, release to stop and insert text

Permissions

Permission	Purpose	Required
Microphone	Audio capture	Yes
Accessibility	Global hotkey + text injection (simulated paste)	Yes
Speech Recognition	Apple on-device ASR engine	Only if using Apple Speech
Screen Recording	OCR for screen context and Voice Command mode	Optional
Network	Model downloads; remote LLM API calls	First run / remote mode

Remote LLM Providers

OpenType supports both OpenAI-compatible and Anthropic API formats:

Provider	API Format	Base URL
OpenAI	OpenAI	`https://api.openai.com/v1`
Anthropic Claude	Anthropic	`https://api.anthropic.com/v1`
Google Gemini	OpenAI	`https://generativelanguage.googleapis.com/v1beta/openai`
OpenRouter	OpenAI	`https://openrouter.ai/api/v1`
SiliconFlow	OpenAI	`https://api.siliconflow.cn/v1`
Volcengine Doubao	OpenAI	`https://ark.cn-beijing.volces.com/api/v3`
Alibaba Bailian	OpenAI	`https://dashscope.aliyuncs.com/compatible-mode/v1`
MiniMax (China)	OpenAI	`https://api.minimax.chat/v1`
MiniMax (Global)	OpenAI	`https://api.minimaxi.chat/v1`

Project Structure

Sources/
├── App/          # Entry point, AppDelegate, AppState, VoicePipeline, AppIcon
├── Audio/        # Microphone capture (AVAudioEngine), sound playback
├── Config/       # AppSettings, ModelCatalog, RemoteModelConfig, Localization
├── Hotkey/       # Global hotkey via CGEvent tap
├── LLM/          # LLMEngine (MLX), RemoteLLMClient (OpenAI/Anthropic), PromptBuilder
├── Output/       # Text injection (Accessibility API + clipboard paste)
├── Processing/   # TextProcessor, InputHistory, MemoryStore, PersonalDictionary
├── Screen/       # Screen OCR (ScreenCaptureKit + Vision)
├── Speech/       # SpeechEngine protocol, WhisperKit engine, Apple Speech engine
├── UI/           # SwiftUI: MenuBar, Settings, Onboarding, Overlay, History, Models
└── Resources/    # Localization strings (en/zh-Hans), sounds, app icon
scripts/
├── build-app.sh            # Build .app bundle and .dmg installer
├── generate-icon.swift     # Generate AppIcon.icns from source PNG
└── create-signing-cert.sh  # Generate self-signed code signing certificate

Tech Stack

WhisperKit — offline Whisper speech recognition
mlx-swift-lm — local LLM inference on Apple Silicon (Qwen2.5 / Qwen3)
SwiftUI + AppKit — native macOS UI
ScreenCaptureKit + Vision — screen OCR
AVAudioEngine — low-latency microphone capture
Apple Speech Framework — on-device speech recognition

License

MIT

Made with care for Apple Silicon

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.claude		.claude
.github/workflows		.github/workflows
Docs		Docs
Resources		Resources
Sources		Sources
scripts		scripts
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
Package.resolved		Package.resolved
Package.swift		Package.swift
README.md		README.md
README_zh.md		README_zh.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenType

Overview

Features

System Requirements

Installation

Download

Build from Source

First Run

Permissions

Remote LLM Providers

Project Structure

Tech Stack

License

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OpenType

Overview

Features

System Requirements

Installation

Download

Build from Source

First Run

Permissions

Remote LLM Providers

Project Structure

Tech Stack

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages