A command-line tool to profile CoreML models — showing per-operation compute device assignments (CPU/GPU/ANE), compilation time, and prediction latency across all MLComputeUnits configurations.
Replicates what Xcode's CoreML Performance Report does, but from the terminal and designed for programmatic use by coding agents.
$ coreml-cli test_models/160ms/
Device: Apple M4 Pro (arm64)
RAM: 48GB
OS: macOS 26.3.1
── decoder ────────────────────────────────────────────────────────────────────
Parakeet EOU decoder (RNNT prediction network) (Fluid Inference)
Mixed (Float16, Float32, Int16, Int32) | torch==2.4.0 | coremltools 8.3.0
inputs: targets(Int32 1×1), target_length(Int32 1), h_in(Float32 1×1×640),
c_in(Float32 1×1×640)
outputs: decoder(Float32 1×640×1), h_out(Float32 1×1×640),
c_out(Float32 1×1×640)
cold compile: 128ms
Compute Unit CPU GPU ANE Compile Predict
────────────────────────────────────────────────────────────────
all 100.0% 0.0% 0.0% 7ms 0.22ms
cpu_only 100.0% 0.0% 0.0% 6ms 0.22ms
cpu_and_gpu 100.0% 0.0% 0.0% 6ms 0.23ms
cpu_and_neural_engine 100.0% 0.0% 0.0% 5ms 0.26ms
── streaming_encoder ──────────────────────────────────────────────────────────
Mixed (Float16, Float32, Int32) | torch==2.4.0 | coremltools 8.3.0
...
cold compile: 3512ms
Compute Unit CPU GPU ANE Compile Predict
────────────────────────────────────────────────────────────────
all 0.0% 100.0% 0.0% 46ms 6.78ms
cpu_only 100.0% 0.0% 0.0% 47ms 5.45ms
cpu_and_gpu 0.0% 100.0% 0.0% 49ms 6.67ms
cpu_and_neural_engine 1.2% 0.0% 98.8% 51ms 2.82ms
Requires macOS 14+ and uv.
git clone https://github.com/yourusername/coreml-cli
cd coreml-cli
uv sync# Profile a single model (all compute unit configs)
uv run coreml-cli model.mlmodelc
# Profile all models in a directory
uv run coreml-cli path/to/models/
# Specific compute unit config
uv run coreml-cli model.mlmodelc --units cpu_and_neural_engine
# JSON output (for programmatic use)
uv run coreml-cli model.mlmodelc --json
# Include per-operation breakdown
uv run coreml-cli model.mlmodelc --ops
# Per-op data with private API details (backend support, estimated runtimes)
uv run coreml-cli model.mlmodelc --detailed
# ANE fallback analysis — show CPU ops grouped by rejection reason
uv run coreml-cli model.mlmodelc --fallback
# Fallback analysis as JSON (for agent consumption)
uv run coreml-cli model.mlmodelc --fallback --json
# Control benchmark iterations
uv run coreml-cli model.mlmodelc --iterations 50
# Debug logging to stderr
uv run coreml-cli model.mlmodelc --debugFor each model:
- Cold compile time — measured once per model by bypassing the E5 compilation cache (private API:
setExperimentalMLProgramEncryptedCacheUsage_(0)). Reflects what users experience the first time the model runs on their device — if this is too high, the model may not be usable. For a true first-launch measurement, restartANECompilerServicebefore benchmarking:sudo killall ANECompilerService.
For each compute unit configuration (all, cpu_only, cpu_and_gpu, cpu_and_neural_engine):
- Device assignment — % of operations on CPU, GPU, and ANE (Neural Engine)
- Compile time — cached load time (E5 bundle cache populated). This is the cost paid on every app launch.
- Predict latency — median prediction time (5 warmup + 10 timed iterations)
- Model metadata — precision, I/O shapes, author, description, coremltools version
- Per-op breakdown (
--ops) — each operation's name, type, assigned device, and cost weight - Private API data (
--detailed) — selected backend, all supported backends, estimated runtime per backend, validation messages explaining why backends were rejected
Shows only ops that are not on ANE, grouped by rejection reason. Designed for the ANE optimization loop: change conversion → reconvert → --fallback → identify blockers → fix → repeat.
For each CPU-fallback op, reports:
- Why ANE rejected it — e.g., "Unsupported tensor data type: int32", "Unsupported MIL operation"
- How many ops — grouped by rejection reason with op type counts
- Estimated CPU cost — how much latency the fallback adds
- Which ops — names for tracing back to the conversion script
Common ANE rejection reasons and fixes:
Unsupported tensor data type: int32— cast to float16 before these operationsUnsupported MIL operation "lstm"— decompose into supported ops (matmul, sigmoid, tanh)Unsupported MIL operation "logical_and"— replace with float multiply workaroundUnable to resolve operation input— cascading from another CPU op; fix the upstream op firstANE supported but scheduler chose CPU— data transfer overhead; often not worth fixing
Uses PyObjC to call macOS CoreML framework APIs directly from Python:
- Public API —
MLComputePlan(macOS 14+) for per-operation device assignment and cost weights - Private API —
MLE5Engine.segmentationAnalyticsAndReturnError:for richer data including backend support matrices and estimated runtimes per backend
Heavily inspired by:
- maderix/ANE — reverse-engineered private
_ANEClient/_ANECompilerAPIs for direct Neural Engine access. Their runtime introspection approach (objc_msgSend,NSClassFromString) informed how we navigate CoreML's internal object graph. - freedomtan/coreml_modelc_profling — per-operation profiling using both public
MLComputePlanand undocumentedMLE5EngineAPIs. Their Objective-C implementation was the direct reference for our private profiler.
Note that this was a weekend project, built with Claude Code.
- Hardware-specific — compute plans and compilation are tied to the local chip. Results on an M4 Pro will differ from an M1 or A17 Pro.
- Private APIs may break — the
MLE5Enginepath (--detailed) uses undocumented APIs that may change across macOS versions. - macOS 26 tested — CoreML enum values changed in macOS 26 (Tahoe). The tool uses framework constants to stay portable, but has only been tested on macOS 26.
MIT