[Model] support VibeThinker model #9615

isLinXu · 2025-12-15T13:43:48Z

VibeThinker-1.5B

🚨 We recommend using this model for competitive-style math and algorithm coding problems(such as Leetcode,Codeforces,etc). It works better to ask the question in English. We do not advise using it for other tasks, as this is an experimental release aimed at exploring the reasoning capabilities of small models.

📁 Github | 🤖 Model Scope | 📄 Techical Report

Introduction

VibeThinker-1.5B is a 1.5-billion parameter dense language model. With a total training cost of only $7,800 USD, it achieves reasoning performance comparable to larger models like GPT OSS-20B Medium.

Key Performance Data

💡 Mathematical Reasoning: On the three major math benchmarks AIME24, AIME25, and HMMT25, its scores (80.3, 74.4, and 50.4, respectively) all surpass those of the initial DeepSeek R1 model, which has over 400 times the parameters (scores of 79.8, 70.0, and 41.7, respectively).

🌱 Code Generation: It achieved scores of 55.9 on LiveCodeBench v5 and 51.1 on v6. Its v6 score slightly leads Magistral Medium (50.3), underscoring its strong reasoning performance.

🔁 On the AIME 25 benchmark, VibeThinker-1.5B significantly extends the Pareto frontier of reasoning accuracy versus model scale, demonstrating that exceptional performance can be achieved with extreme parameter efficiency.

Training Pipeline

VibeThinker-1.5B's core innovation lies in the "Spectrum-to-Signal Principle" (SSP) training framework: it first explores solution diversity during the Supervised Fine-Tuning (SFT) stage, and then optimizes its policy to reinforce correct signals in the Reinforcement Learning (RL) stage. By systematically integrating these two phases, our approach establishes diversity as the central technical design principle, enabling VibeThinker-1.5B to achieve robust performance that surpasses conventional training paradigms.

Usage

Create a new file examples/train_full/vibethinker_sft.yaml with the following content:

### model
model_name_or_path: WeiboAI/VibeThinker-1.5B
trust_remote_code: true

### method
stage: sft
do_train: true
finetuning_type: lora
lora_rank: 8
lora_target: all

### dataset
# dataset: identity,alpaca_en_demo
dataset: identity

template: vibethinker

cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4

### output
output_dir: saves/WeiboAI/vibethinker/lora/sft
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false
report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
resume_from_checkpoint: null

### eval
# eval_dataset: alpaca_en_demo
# val_size: 0.1
# per_device_eval_batch_size: 1
# eval_strategy: steps
# eval_steps: 500

Training

SFT

# 1 gpu
DISABLE_VERSION_CHECK=1 llamafactory-cli train examples/train_lora/vibethinker_sft.yaml
# 1 node
DISABLE_VERSION_CHECK=1 FORCE_TORCHRUN=1 llamafactory-cli train examples/train_lora/vibethinker_sft.yaml

results

RL

Create a new file examples/train_lora/vibethinker_lora_dpo.yaml with the following content:

### model
model_name_or_path: WeiboAI/VibeThinker-1.5B
trust_remote_code: true

### method
stage: dpo
do_train: true
finetuning_type: lora
lora_rank: 8
lora_target: all
pref_beta: 0.1
pref_loss: sigmoid  # choices: [sigmoid (dpo), orpo, simpo]

### dataset
dataset: dpo_en_demo
template: vibethinker
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4

### output
output_dir: saves/saves/WeiboAI/vibethinker/lora/dpo
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false
report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 5.0e-6
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
resume_from_checkpoint: null

### eval
# eval_dataset: dpo_en_demo
# val_size: 0.1
# per_device_eval_batch_size: 1
# eval_strategy: steps
# eval_steps: 500

Training

# 1 gpu
DISABLE_VERSION_CHECK=1 llamafactory-cli train examples/train_lora/vibethinker_lora_dpo.yaml
# 1 node
DISABLE_VERSION_CHECK=1 FORCE_TORCHRUN=1 llamafactory-cli train examples/train_lora/vibethinker_lora_dpo.yaml

gemini-code-assist · 2025-12-15T13:44:02Z

Summary of Changes

Hello @isLinXu, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates the VibeThinker-1.5B model into the system, enabling users to leverage its specialized reasoning capabilities for tasks like competitive programming. The changes ensure that the model is properly recognized, configured with its unique chat format, and documented for easy discovery and use within the framework.

Highlights

Model Integration: Added comprehensive support for the VibeThinker-1.5B model, a 1.5-billion parameter language model noted for its strong reasoning capabilities in competitive math and coding problems.
Documentation Update: Updated both the English and Chinese README.md files to include VibeThinker-1.5B in the list of supported models, providing quick reference for users.
Chat Template Registration: Registered a new chat template named vibethinker in src/llamafactory/data/template.py, defining the specific input/output format for interacting with the VibeThinker model.
Model Configuration: Configured the VibeThinker-1.5B model in src/llamafactory/extras/constants.py, linking it to its download sources (Hugging Face and ModelScope) and associating it with the newly defined vibethinker chat template.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds support for the VibeThinker-1.5B model. The changes include adding the model to the documentation, defining a new chat template, and registering the model in the constants file. My review focuses on code maintainability and consistency. I've suggested a refactoring to reduce code duplication in the template definition and pointed out some minor ordering issues in the documentation files for better consistency.

gemini-code-assist · 2025-12-15T13:45:49Z

README.md

 | [StarCoder 2](https://huggingface.co/bigcode)                     | 3B/7B/15B                        | -                    |
 | [TeleChat2](https://huggingface.co/Tele-AI)                       | 3B/7B/35B/115B                   | telechat2            |
 | [XVERSE](https://huggingface.co/xverse)                           | 7B/13B/65B                       | xverse               |
+| [VibeThinker-1.5B](https://huggingface.co/WeiboAI)                | 1.5B                             | vibethinker          |


To maintain consistency and readability, please ensure the models in this table are listed in alphabetical order. VibeThinker-1.5B should be placed before XVERSE.

gemini-code-assist · 2025-12-15T13:45:49Z

README_zh.md

 | [StarCoder 2](https://huggingface.co/bigcode)                     | 3B/7B/15B                        | -                    |
 | [TeleChat2](https://huggingface.co/Tele-AI)                       | 3B/7B/35B/115B                   | telechat2            |
 | [XVERSE](https://huggingface.co/xverse)                           | 7B/13B/65B                       | xverse               |
+| [VibeThinker-1.5B](https://huggingface.co/WeiboAI)                | 1.5B                             | vibethinker          |


To maintain consistency and readability, please ensure the models in this table are listed in alphabetical order. VibeThinker-1.5B should be placed before XVERSE.

gemini-code-assist · 2025-12-15T13:45:49Z

src/llamafactory/data/template.py

+register_template(
+    name="vibethinker",
+    format_user=StringFormatter(slots=["<|im_start|>user\n{{content}}<|im_end|>\n<|im_start|>assistant\n"]),
+    format_assistant=StringFormatter(slots=["{{content}}<|im_end|>\n"]),
+    format_system=StringFormatter(slots=["<|im_start|>system\n{{content}}<|im_end|>\n"]),
+    format_function=FunctionFormatter(slots=["{{content}}<|im_end|>\n"], tool_format="qwen"),
+    format_observation=StringFormatter(
+        slots=["<|im_start|>user\n<tool_response>\n{{content}}\n</tool_response><|im_end|>\n<|im_start|>assistant\n"]
+    ),
+    format_tools=ToolFormatter(tool_format="qwen"),
+    stop_words=["<|im_end|>"],
+    replace_eos=True,
+)


The new vibethinker template is almost an exact copy of the qwen template, with the only difference being the absence of a default_system message. This introduces code duplication. To improve maintainability, consider creating a copy of the qwen template and just overriding the default_system attribute. This avoids repeating the entire template definition.

from copy import deepcopy vibethinker_template = deepcopy(TEMPLATES["qwen"]) vibethinker_template.default_system = "" TEMPLATES["vibethinker"] = vibethinker_template

isLinXu added 2 commits December 15, 2025 20:38

[Model] support VibeThinker models

e61f879

[Model] support VibeThinker models

8a90ea1

gemini-code-assist bot reviewed Dec 15, 2025

View reviewed changes

isLinXu closed this by deleting the head repository Dec 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Model] support VibeThinker model #9615

[Model] support VibeThinker model #9615

isLinXu commented Dec 15, 2025

Uh oh!

gemini-code-assist bot commented Dec 15, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 15, 2025

Uh oh!

gemini-code-assist bot Dec 15, 2025

Uh oh!

gemini-code-assist bot Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Model] support VibeThinker model #9615

[Model] support VibeThinker model #9615

Conversation

isLinXu commented Dec 15, 2025

VibeThinker-1.5B

Introduction

Key Performance Data

Training Pipeline

Usage

Training

SFT

RL

Training

Uh oh!

gemini-code-assist bot commented Dec 15, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant