Skip to content

Conversation

@isLinXu
Copy link
Contributor

@isLinXu isLinXu commented Dec 15, 2025

VibeThinker-1.5B

🚨 We recommend using this model for competitive-style math and algorithm coding problems(such as Leetcode,Codeforces,etc). It works better to ask the question in English. We do not advise using it for other tasks, as this is an experimental release aimed at exploring the reasoning capabilities of small models.

📁 Github   |   🤖 Model Scope |   📄 Techical Report

Introduction

VibeThinker-1.5B is a 1.5-billion parameter dense language model. With a total training cost of only $7,800 USD, it achieves reasoning performance comparable to larger models like GPT OSS-20B Medium.

image

Key Performance Data

💡 Mathematical Reasoning: On the three major math benchmarks AIME24, AIME25, and HMMT25, its scores (80.3, 74.4, and 50.4, respectively) all surpass those of the initial DeepSeek R1 model, which has over 400 times the parameters (scores of 79.8, 70.0, and 41.7, respectively).

🌱 Code Generation: It achieved scores of 55.9 on LiveCodeBench v5 and 51.1 on v6. Its v6 score slightly leads Magistral Medium (50.3), underscoring its strong reasoning performance.

image

🔁 On the AIME 25 benchmark, VibeThinker-1.5B significantly extends the Pareto frontier of reasoning accuracy versus model scale, demonstrating that exceptional performance can be achieved with extreme parameter efficiency.

image

Training Pipeline

image

VibeThinker-1.5B's core innovation lies in the "Spectrum-to-Signal Principle" (SSP) training framework: it first explores solution diversity during the Supervised Fine-Tuning (SFT) stage, and then optimizes its policy to reinforce correct signals in the Reinforcement Learning (RL) stage. By systematically integrating these two phases, our approach establishes diversity as the central technical design principle, enabling VibeThinker-1.5B to achieve robust performance that surpasses conventional training paradigms.


Usage

Create a new file examples/train_full/vibethinker_sft.yaml with the following content:

### model
model_name_or_path: WeiboAI/VibeThinker-1.5B
trust_remote_code: true

### method
stage: sft
do_train: true
finetuning_type: lora
lora_rank: 8
lora_target: all

### dataset
# dataset: identity,alpaca_en_demo
dataset: identity

template: vibethinker

cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4

### output
output_dir: saves/WeiboAI/vibethinker/lora/sft
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false
report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
resume_from_checkpoint: null

### eval
# eval_dataset: alpaca_en_demo
# val_size: 0.1
# per_device_eval_batch_size: 1
# eval_strategy: steps
# eval_steps: 500

Training

SFT

# 1 gpu
DISABLE_VERSION_CHECK=1 llamafactory-cli train examples/train_lora/vibethinker_sft.yaml
# 1 node
DISABLE_VERSION_CHECK=1 FORCE_TORCHRUN=1 llamafactory-cli train examples/train_lora/vibethinker_sft.yaml

results
20251215-202722@2x

RL

Create a new file examples/train_lora/vibethinker_lora_dpo.yaml with the following content:

### model
model_name_or_path: WeiboAI/VibeThinker-1.5B
trust_remote_code: true

### method
stage: dpo
do_train: true
finetuning_type: lora
lora_rank: 8
lora_target: all
pref_beta: 0.1
pref_loss: sigmoid  # choices: [sigmoid (dpo), orpo, simpo]

### dataset
dataset: dpo_en_demo
template: vibethinker
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4

### output
output_dir: saves/saves/WeiboAI/vibethinker/lora/dpo
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false
report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 5.0e-6
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
resume_from_checkpoint: null

### eval
# eval_dataset: dpo_en_demo
# val_size: 0.1
# per_device_eval_batch_size: 1
# eval_strategy: steps
# eval_steps: 500

Training

# 1 gpu
DISABLE_VERSION_CHECK=1 llamafactory-cli train examples/train_lora/vibethinker_lora_dpo.yaml
# 1 node
DISABLE_VERSION_CHECK=1 FORCE_TORCHRUN=1 llamafactory-cli train examples/train_lora/vibethinker_lora_dpo.yaml

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @isLinXu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates the VibeThinker-1.5B model into the system, enabling users to leverage its specialized reasoning capabilities for tasks like competitive programming. The changes ensure that the model is properly recognized, configured with its unique chat format, and documented for easy discovery and use within the framework.

Highlights

  • Model Integration: Added comprehensive support for the VibeThinker-1.5B model, a 1.5-billion parameter language model noted for its strong reasoning capabilities in competitive math and coding problems.
  • Documentation Update: Updated both the English and Chinese README.md files to include VibeThinker-1.5B in the list of supported models, providing quick reference for users.
  • Chat Template Registration: Registered a new chat template named vibethinker in src/llamafactory/data/template.py, defining the specific input/output format for interacting with the VibeThinker model.
  • Model Configuration: Configured the VibeThinker-1.5B model in src/llamafactory/extras/constants.py, linking it to its download sources (Hugging Face and ModelScope) and associating it with the newly defined vibethinker chat template.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for the VibeThinker-1.5B model. The changes include adding the model to the documentation, defining a new chat template, and registering the model in the constants file. My review focuses on code maintainability and consistency. I've suggested a refactoring to reduce code duplication in the template definition and pointed out some minor ordering issues in the documentation files for better consistency.

| [StarCoder 2](https://huggingface.co/bigcode) | 3B/7B/15B | - |
| [TeleChat2](https://huggingface.co/Tele-AI) | 3B/7B/35B/115B | telechat2 |
| [XVERSE](https://huggingface.co/xverse) | 7B/13B/65B | xverse |
| [VibeThinker-1.5B](https://huggingface.co/WeiboAI) | 1.5B | vibethinker |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To maintain consistency and readability, please ensure the models in this table are listed in alphabetical order. VibeThinker-1.5B should be placed before XVERSE.

| [StarCoder 2](https://huggingface.co/bigcode) | 3B/7B/15B | - |
| [TeleChat2](https://huggingface.co/Tele-AI) | 3B/7B/35B/115B | telechat2 |
| [XVERSE](https://huggingface.co/xverse) | 7B/13B/65B | xverse |
| [VibeThinker-1.5B](https://huggingface.co/WeiboAI) | 1.5B | vibethinker |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To maintain consistency and readability, please ensure the models in this table are listed in alphabetical order. VibeThinker-1.5B should be placed before XVERSE.

Comment on lines +2138 to +2150
register_template(
name="vibethinker",
format_user=StringFormatter(slots=["<|im_start|>user\n{{content}}<|im_end|>\n<|im_start|>assistant\n"]),
format_assistant=StringFormatter(slots=["{{content}}<|im_end|>\n"]),
format_system=StringFormatter(slots=["<|im_start|>system\n{{content}}<|im_end|>\n"]),
format_function=FunctionFormatter(slots=["{{content}}<|im_end|>\n"], tool_format="qwen"),
format_observation=StringFormatter(
slots=["<|im_start|>user\n<tool_response>\n{{content}}\n</tool_response><|im_end|>\n<|im_start|>assistant\n"]
),
format_tools=ToolFormatter(tool_format="qwen"),
stop_words=["<|im_end|>"],
replace_eos=True,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The new vibethinker template is almost an exact copy of the qwen template, with the only difference being the absence of a default_system message. This introduces code duplication. To improve maintainability, consider creating a copy of the qwen template and just overriding the default_system attribute. This avoids repeating the entire template definition.

from copy import deepcopy

vibethinker_template = deepcopy(TEMPLATES["qwen"])
vibethinker_template.default_system = ""
TEMPLATES["vibethinker"] = vibethinker_template

@isLinXu isLinXu closed this by deleting the head repository Dec 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant