Skip to content

Add Qwen3-VL-2B and Qwen2.5-VL-3B builtin optimization recipes#254

Open
hanbitmyths wants to merge 16 commits intomicrosoft:mainfrom
hanbitmyths:sunghcho/qwen3-vl
Open

Add Qwen3-VL-2B and Qwen2.5-VL-3B builtin optimization recipes#254
hanbitmyths wants to merge 16 commits intomicrosoft:mainfrom
hanbitmyths:sunghcho/qwen3-vl

Conversation

@hanbitmyths
Copy link

@hanbitmyths hanbitmyths commented Mar 4, 2026

This PR adds end-to-end optimization recipes for Qwen3-VL-2B-Instruct and Qwen2.5-VL-3B-Instruct vision-language models, with CPU/mobile and CUDA configurations, evaluation scripts, and inference examples.

  • Qwen3-VL-2B-Instruct/Qwen3-VL-4B-Instruct/Qwen3-VL-8B-Instruct
- optimize.py — Export and optimize the model via Olive's Model Builder with pass configurations for vision, text, and embedding ONNX subgraphs.
- eval.py — Evaluate ONNX model accuracy against PyTorch reference on the DocVQA dataset, comparing exact match and ANLS scores.
- inference.py — Run interactive or batch multimodal inference with the optimized model.
- user_script.py — Custom model loading with patched Qwen3VLForConditionalGeneration for ONNX-compatible export.
- codes/modeling_qwen3_vl.py — Modified Qwen3-VL model code for ONNX export compatibility.
- CPU/mobile and CUDA Olive pass configs (cpu_and_mobile/, cuda/) for vision, text, and embedding components.
  • Qwen2.5-VL-3B-Instruct (Qwen-Qwen2.5-VL-3B-Instruct/builtin/)
- Restructured from olive to builtin directory layout matching the Qwen3-VL recipe structure.
- Added eval.py, optimize.py, CUDA configs, and embedding export configs.
- Updated codes/modeling_qwen2_5_vl.py for improved ONNX export compatibility.
- Added .gitignore for model output artifacts.
- Shared infrastructure.
- Both recipes include info.yml for AI Toolkit integration, requirements.txt, and a sample cat.jpeg test image.
- Evaluation script supports both ONNX (via onnxruntime-genai) and PyTorch reference models.

This PR depends on Olive PR.

- Restructure from olive/ subdir to recipe root
- Export pipeline with Olive: embedding + text (ModelBuilder INT4) + vision (Dynamo->INT4)
- Add CUDA configs alongside cpu_and_mobile configs
- eval.py: AI2D benchmark evaluation vs PyTorch baseline
- Multi-image support via QwenVisionState in ort-genai runtime
- Requirements aligned: transformers>=4.57.0,<6.0, torch>=2.10.0

Results (AI2D, 100 samples, CPU):
  Qwen3-VL-2B:   ONNX INT4 69% vs PyTorch FP32 74% (-5pp), 1.45x faster
  Qwen2.5-VL-3B: ONNX INT4 83% vs PyTorch FP32 81% (+2pp), 1.24x faster
…ripts

- Move all files (except LICENSE) under builtin/ for Qwen3-VL and Qwen2.5-VL
- Consolidate optimize.py and user_script.py to parent level with --config-dir flag
- Remove duplicate export.py, optimize.py, user_script.py from cpu_and_mobile/ and cuda/
- Update JSON configs to reference ../user_script.py
- Update READMEs with new structure and commands
…gs and eval

- Qwen3-VL-2B-Instruct: cpu_and_mobile + cuda configs, eval.py, user_script.py, modeling code
- Qwen2.5-VL-3B-Instruct: cpu_and_mobile + cuda configs, eval.py, user_script.py, modeling code
- Vision/text/embedding JSON configs for both models and both targets
Copilot AI review requested due to automatic review settings March 4, 2026 06:48
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds two new “builtin” optimization recipes to the repo for Qwen3-VL-2B-Instruct and Qwen2.5-VL-3B-Instruct, including Olive export/optimization/quantization pipelines (CPU/mobile + CUDA), ONNX Runtime GenAI config generation, and runnable eval/inference scripts.

Changes:

  • Added a new builtin recipe for Qwen3-VL-2B-Instruct (custom exportable modeling code, Olive JSON pipelines, optimize/eval/inference scripts, docs).
  • Restructured Qwen2.5-VL-3B-Instruct from the old olive/ layout into a matching builtin/ layout and added CPU/mobile + CUDA pipelines, eval, and updated modeling code for export.
  • Added runtime config generation (genai_config.json patch + processor_config.json creation) for both models.

Reviewed changes

Copilot reviewed 32 out of 38 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
Qwen-Qwen3-VL-2B-Instruct/builtin/user_script.py Olive callbacks for loading/exporting embedding + vision submodels.
Qwen-Qwen3-VL-2B-Instruct/builtin/requirements.txt Recipe Python dependencies for Qwen3-VL builtin flow.
Qwen-Qwen3-VL-2B-Instruct/builtin/optimize.py Orchestrates Olive runs and generates ORT GenAI configs.
Qwen-Qwen3-VL-2B-Instruct/builtin/info.yml AI Toolkit metadata (keywords/EP/device/name).
Qwen-Qwen3-VL-2B-Instruct/builtin/inference.py ORT GenAI inference entrypoint (text + image, interactive mode).
Qwen-Qwen3-VL-2B-Instruct/builtin/eval.py AI2D evaluation comparing ONNX vs PyTorch baseline.
Qwen-Qwen3-VL-2B-Instruct/builtin/cuda/vision.json CUDA vision export + graph surgeries + FP16 flow.
Qwen-Qwen3-VL-2B-Instruct/builtin/cuda/text.json CUDA text export via ModelBuilder (INT4).
Qwen-Qwen3-VL-2B-Instruct/builtin/cuda/embedding.json CUDA embedding export + ORT opt + FP16 flow.
Qwen-Qwen3-VL-2B-Instruct/builtin/cpu_and_mobile/vision.json CPU/mobile vision export + graph surgeries + INT4 quantization.
Qwen-Qwen3-VL-2B-Instruct/builtin/cpu_and_mobile/text.json CPU/mobile text export via ModelBuilder (INT4).
Qwen-Qwen3-VL-2B-Instruct/builtin/cpu_and_mobile/embedding.json CPU/mobile embedding export + ORT opt + INT4 quantization.
Qwen-Qwen3-VL-2B-Instruct/builtin/codes/modeling_qwen3_vl.py Custom Qwen3-VL modeling code adapted for ONNX export + Olive surgeries.
Qwen-Qwen3-VL-2B-Instruct/builtin/codes/init.py Package marker for custom modeling code.
Qwen-Qwen3-VL-2B-Instruct/builtin/README.md End-to-end usage docs for export, inference, and evaluation.
Qwen-Qwen3-VL-2B-Instruct/builtin/.gitignore Ignores generated models/logs/cache artifacts.
Qwen-Qwen2.5-VL-3B-Instruct/olive/optimize.py Removed old Olive optimize script (superseded by builtin layout).
Qwen-Qwen2.5-VL-3B-Instruct/olive/embedding.json Removed old embedding config (superseded by builtin configs).
Qwen-Qwen2.5-VL-3B-Instruct/olive/README.md Removed old docs (replaced by builtin README).
Qwen-Qwen2.5-VL-3B-Instruct/builtin/user_script.py Updated Olive callbacks; fixes export issues (inv_freq, IO config, dtype).
Qwen-Qwen2.5-VL-3B-Instruct/builtin/requirements.txt Updated recipe Python dependencies.
Qwen-Qwen2.5-VL-3B-Instruct/builtin/optimize.py New builtin optimize orchestration + GenAI config generation.
Qwen-Qwen2.5-VL-3B-Instruct/builtin/info.yml Normalized formatting (same metadata fields).
Qwen-Qwen2.5-VL-3B-Instruct/builtin/inference.py Adjusted default model output path to cpu_and_mobile layout.
Qwen-Qwen2.5-VL-3B-Instruct/builtin/eval.py Added AI2D evaluation comparing ONNX vs PyTorch baseline.
Qwen-Qwen2.5-VL-3B-Instruct/builtin/cuda/vision.json CUDA vision export + surgeries + FP16 flow.
Qwen-Qwen2.5-VL-3B-Instruct/builtin/cuda/text.json CUDA text export via ModelBuilder (INT4).
Qwen-Qwen2.5-VL-3B-Instruct/builtin/cuda/embedding.json CUDA embedding export + ORT opt + FP16 flow.
Qwen-Qwen2.5-VL-3B-Instruct/builtin/cpu_and_mobile/vision.json CPU/mobile vision export + ORT opt + INT4 quantization.
Qwen-Qwen2.5-VL-3B-Instruct/builtin/cpu_and_mobile/text.json CPU/mobile text export switched to INT4 + new output path.
Qwen-Qwen2.5-VL-3B-Instruct/builtin/cpu_and_mobile/embedding.json CPU/mobile embedding export + ORT opt + INT4 quantization.
Qwen-Qwen2.5-VL-3B-Instruct/builtin/codes/modeling_qwen2_5_vl.py Export-compat fixes (rope param compatibility, buffers persistent, window index rewrite).
Qwen-Qwen2.5-VL-3B-Instruct/builtin/codes/init.py Package marker for custom modeling code.
Qwen-Qwen2.5-VL-3B-Instruct/builtin/README.md End-to-end usage docs for export, inference, and evaluation.
Qwen-Qwen2.5-VL-3B-Instruct/builtin/.gitignore Ignores generated models/logs/cache artifacts.
Comments suppressed due to low confidence (1)

Qwen-Qwen2.5-VL-3B-Instruct/builtin/requirements.txt:3

  • torch>=2.10.0 is very likely a typo (PyTorch versions in this repo are pinned around 2.6–2.7). As written, this requirement can make the recipe environment impossible to resolve. Consider aligning to the repo’s supported torch range (e.g., torch>=2.6,<2.9), or pin to the minimum version you actually validated for torch.export/torch.cond support.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants