Skip to content

IchenDEV/opentype

Repository files navigation

OpenType

Local AI-powered voice input for macOS menu bar


GitHub Stars GitHub Forks GitHub Issues

Platform Swift Apple Silicon License

WhisperKit MLX

中文文档


Overview

OpenType is a macOS menu bar app for AI-powered voice input. It supports both fully local on-device inference and remote LLM APIs. Press a hotkey to start recording, release to transcribe, and the result is typed directly into whatever app you're using.

Three output modes are available:

  • Verbatim — raw transcription, lowest latency
  • Smart Format — transcription cleaned up by an LLM (contextual filler removal, grammar fixes, structured formatting)
  • Voice Command — speak a command and get an AI-generated response based on screen context

Features

Feature Description
Dual Speech Engines Apple Speech (built-in, zero download) or WhisperKit (offline Whisper models)
Smart Text Processing Local MLX Qwen2.5/Qwen3 or remote LLM — contextual cleanup, self-correction handling, structured list formatting
Remote LLM Support OpenAI, Claude (Anthropic format), Gemini, OpenRouter, SiliconFlow, Doubao, Bailian, MiniMax (CN & Global)
Global Hotkey Configurable key (Fn/Ctrl/Shift/Option) with long-press, double-tap, or single-tap activation
Screen Context OCR Captures on-screen text via ScreenCaptureKit + Vision to help the LLM correct homophones
Voice Command Mode Screen-aware voice assistant — summarize, reply, translate based on what's on screen
Input Memory Recent input history injected as LLM context for better continuity
Edit Rules Personal text replacement rules applied on every output
Language Style Presets Concise / Formal / Casual / Custom prompt per language
Input History & Stats Full history with raw vs. processed comparison, word count stats, configurable retention
Bilingual UI Chinese and English interface, independent of recognition language
Sound Feedback Audio cues on recording start and stop
Guided Onboarding Step-by-step setup: permissions, model download, and first use

System Requirements

  • OS: macOS 26 (Tahoe) or later
  • Chip: Apple Silicon (M1 / M2 / M3 / M4)
  • Disk: ~400 MB minimum (Apple Speech + Qwen3-0.6B), up to ~4 GB with larger models

Installation

Download

Grab the latest .dmg from Releases, open it, and drag OpenType.app to Applications.

"Cannot verify the developer" on first launch? The app is not notarized by Apple. Before first run, execute in Terminal:

xattr -cr /Applications/OpenType.app

Or go to System Settings → Privacy & Security and click "Open Anyway".

Build from Source

# Build .app bundle + .dmg installer
bash scripts/build-app.sh

# Or for development
swift build
swift run OpenType

# Or open in Xcode
open Package.swift

First Run

  1. Launch OpenType — it appears as a waveform icon in the menu bar
  2. The onboarding wizard guides you through permissions and model setup
  3. Grant Microphone and Accessibility permissions (required)
  4. Wait for the LLM model to download (~335 MB, one-time)
  5. Hold Fn to start dictating, release to stop and insert text

Permissions

Permission Purpose Required
Microphone Audio capture Yes
Accessibility Global hotkey + text injection (simulated paste) Yes
Speech Recognition Apple on-device ASR engine Only if using Apple Speech
Screen Recording OCR for screen context and Voice Command mode Optional
Network Model downloads; remote LLM API calls First run / remote mode

Remote LLM Providers

OpenType supports both OpenAI-compatible and Anthropic API formats:

Provider API Format Base URL
OpenAI OpenAI https://api.openai.com/v1
Anthropic Claude Anthropic https://api.anthropic.com/v1
Google Gemini OpenAI https://generativelanguage.googleapis.com/v1beta/openai
OpenRouter OpenAI https://openrouter.ai/api/v1
SiliconFlow OpenAI https://api.siliconflow.cn/v1
Volcengine Doubao OpenAI https://ark.cn-beijing.volces.com/api/v3
Alibaba Bailian OpenAI https://dashscope.aliyuncs.com/compatible-mode/v1
MiniMax (China) OpenAI https://api.minimax.chat/v1
MiniMax (Global) OpenAI https://api.minimaxi.chat/v1

Project Structure

Sources/
├── App/          # Entry point, AppDelegate, AppState, VoicePipeline, AppIcon
├── Audio/        # Microphone capture (AVAudioEngine), sound playback
├── Config/       # AppSettings, ModelCatalog, RemoteModelConfig, Localization
├── Hotkey/       # Global hotkey via CGEvent tap
├── LLM/          # LLMEngine (MLX), RemoteLLMClient (OpenAI/Anthropic), PromptBuilder
├── Output/       # Text injection (Accessibility API + clipboard paste)
├── Processing/   # TextProcessor, InputHistory, MemoryStore, PersonalDictionary
├── Screen/       # Screen OCR (ScreenCaptureKit + Vision)
├── Speech/       # SpeechEngine protocol, WhisperKit engine, Apple Speech engine
├── UI/           # SwiftUI: MenuBar, Settings, Onboarding, Overlay, History, Models
└── Resources/    # Localization strings (en/zh-Hans), sounds, app icon
scripts/
├── build-app.sh            # Build .app bundle and .dmg installer
├── generate-icon.swift     # Generate AppIcon.icns from source PNG
└── create-signing-cert.sh  # Generate self-signed code signing certificate

Tech Stack

  • WhisperKit — offline Whisper speech recognition
  • mlx-swift-lm — local LLM inference on Apple Silicon (Qwen2.5 / Qwen3)
  • SwiftUI + AppKit — native macOS UI
  • ScreenCaptureKit + Vision — screen OCR
  • AVAudioEngine — low-latency microphone capture
  • Apple Speech Framework — on-device speech recognition

License

MIT


Made with care for Apple Silicon

About

Local AI-powered voice input for macOS menu bar

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors