Skip to content

Releases: EricLBuehler/mistral.rs

v0.6.0

10 Jun 23:28

Choose a tag to compare

🔥 Highlights from v0.6.0

🚀 Major Features

  • Llama 4 support and Qwen 3 / MoE / VL models, including DeepSeek and DeepCoder integrations
  • Multimodal prefix caching, paged attention scheduler improvements, and faster Metal/CUDA backends
  • Web chat app with chat history, file uploads, speech generation, and revamped tool-calling/search
  • Fast sampler and CPU FlashAttention with improved performance and accuracy
  • Metal and CUDA: major improvements in quantization (AFQ, ISQ), UQFF handling, and memory optimizations
  • MCP (Model Context Protocol): new server endpoints, docs, and integrated client
  • Vision and audio expansion: support for SIGLIP, Dia 1.6b TTS, conformer backbone (Phi-4MM), auto loaders, and vision tool prefixes

🧠 Inference Optimizations

  • Lightning-fast AFQ on CPU, optimized Qwen 3 MoE on Metal, and paged attention fixes
  • Unified FlashAttention backend and automatic method selection for ISQ
  • Metal precompilation support and reduced autorelease thrashing

🧰 Dev Improvements

  • Refactored engine architecture, KV cache, attention backends, and device mapping logic
  • Centralized dependency management and cleaner internal abstractions
  • Streamlined and faster LoRA support

🎉 Other

  • Revamped README, AGENTS.md, and new benchmarking scripts
  • Interactive mode now shows throughput, supports Gumbel sampling, and better runtime sampling controls
  • Expanded quant and GGUF support: AWQ, Qwen3 GGUF, and prequantized MLX compatibility

What's Changed

Read more

v0.5.0

24 Mar 04:16
7c086a9

Choose a tag to compare

Highlights

Blog post: https://huggingface.co/blog/EricB/mistralrs-v0-5-0

Thank you to all contributors for this release! This release includes the following highlights but also countless improvements, fixes, and optimizations.

  • Support for many more models:
    • Gemma 3
    • Qwen 2.5 VL
    • Mistral Small 3.1
    • Phi 4 Multimodal (image only)
  • Native tool calling support for:
    • Llama 3.1/3.2/3.3
    • Mistral Small 3
    • Mistral Nemo
    • Hermes 2 Pro
    • Hermes 3
  • Tensor Parallelism support (NCCL)!
  • FlashAttention V3 support and integration in PagedAttention
  • 30x reduction in ISQ times on Metal!
  • Revamped prefix cacher system

What's Changed

Read more

v0.4.0

22 Jan 19:39

Choose a tag to compare

New features

  • 🔥 New models!
    • DeepSeek V2
    • DeepSeek V3 and R1
    • MiniCpm-O 2.6
  • 🧮 Imatrix quantization
  • ⚙️ Automatic device mapping
  • BNB quantization
  • Support blockwise FP8 dequantization and FP8 on Metal
  • Integrate the llguidance library (@mmoskal)
  • Metal PagedAttention
  • Many fixes and improvements from contributors!

Breaking changes

  • The Rust device mapping API has changed.

MSRV

The MSRV of this release is 1.83.0.

What's Changed

Read more

v0.3.4

28 Nov 19:27
68c078f

Choose a tag to compare

New features

  • Qwen2-VL support
  • Idefics 3/SmolVLM support
  • ️‍🔥 6x prompt performance boost (all benchmarks faster than or comparable to MLX, llama.cpp)!
  • 🗂️ More efficient non-PagedAttention KV cache implementation!
  • Public tokenization API

Python wheels

The wheels now include support for Windows, Linux, and Mac with x84_64 and aarch64.

MSRV

1.79.0

What's Changed

New Contributors

Full Changelog: v0.3.2...v0.3.4

v0.3.2

28 Oct 15:44
57a8b03

Choose a tag to compare

Key changes

  • General improvements and fixes
  • ISQ FP8
  • GPTQ Marlin
  • 26% performance boost on Metal
  • Python package wheels are available. See below and the various PyPi packages.

What's Changed

New Contributors

Full Changelog: v0.3.1...v0.3.2

v0.3.1

29 Sep 15:39
1caf83a

Choose a tag to compare

Highlights

  • UQFF
  • FLUX model
  • Llama 3.2 Vision model

MSRV

The MSRV of this release is 1.79.0.

What's Changed

Full Changelog: v0.3.0...v0.3.1

v0.3.0

02 Sep 17:27
ae71578

Choose a tag to compare

Highlights

  • New model topology feature: ISQ and device mapping
  • 🔥Faster FlashAttention support when batching
  • Removed plotly and associated JS dependencies
  • φ³ Support Phi 3.5, Phi 3.5 vision, Phi 3.5 MoE
  • Improved Rust API ergonomics
  • Support multiple (shaded) GGUF files

MSRV

The Rust MSRV of this version is 1.79.0

What's Changed

New Contributors

Full Changelog: v0.2.5...v0.3.0

v0.2.5

16 Aug 01:10
e64a71a

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.2.4...v0.2.5

Install mistralrs-server 0.2.5

Install prebuilt binaries via shell script

curl --proto '=https' --tlsv1.2 -LsSf https://github.com/EricLBuehler/mistral.rs/releases/download/v0.2.5/mistralrs-server-installer.sh | sh

Download mistralrs-server 0.2.5

File Platform Checksum
mistralrs-server-aarch64-apple-darwin.tar.xz Apple Silicon macOS checksum
mistralrs-server-x86_64-apple-darwin.tar.xz Intel macOS checksum
mistralrs-server-x86_64-unknown-linux-gnu.tar.xz x64 Linux checksum

v0.2.4

01 Aug 12:22
8a84d05

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.2.3...v0.2.4

MSRV

MSRV is 1.75

Install mistralrs-server 0.2.4

Install prebuilt binaries via shell script

curl --proto '=https' --tlsv1.2 -LsSf https://github.com/EricLBuehler/mistral.rs/releases/download/v0.2.4/mistralrs-server-installer.sh | sh

Download mistralrs-server 0.2.4

File Platform Checksum
mistralrs-server-aarch64-apple-darwin.tar.xz Apple Silicon macOS checksum
mistralrs-server-x86_64-apple-darwin.tar.xz Intel macOS checksum
mistralrs-server-x86_64-unknown-linux-gnu.tar.xz x64 Linux checksum

v0.2.3

28 Jul 18:52
9b898ee

Choose a tag to compare

What's Changed

Full Changelog: v0.2.2...v0.2.3

Install mistralrs-server 0.2.3

Install prebuilt binaries via shell script

curl --proto '=https' --tlsv1.2 -LsSf https://github.com/EricLBuehler/mistral.rs/releases/download/v0.2.3/mistralrs-server-installer.sh | sh

Download mistralrs-server 0.2.3

File Platform Checksum
mistralrs-server-aarch64-apple-darwin.tar.xz Apple Silicon macOS checksum
mistralrs-server-x86_64-apple-darwin.tar.xz Intel macOS checksum
mistralrs-server-x86_64-unknown-linux-gnu.tar.xz x64 Linux checksum