[BREAKING] config: set the default value of actor.entropy_coeff to 0 #1770

eric-haibin-lin · 2025-05-30T05:44:43Z

Checklist Before Starting

Search for similar PR(s).

What does this PR do?

entropy_coeff shall be set carefully during RL. When enabled, inappropriate coefficient may case training to collapse. You can see more empirical experiments from Skywork Open Reasoner 1 Technical Report (https://arxiv.org/pdf/2505.22312).

In this PR, the default value of entropy_coeff is set to 0. This is a breaking change that may affect your experiment, although majority of verl example scripts set it to 0 manually already.

We let most example script just pick up the default value of 0 for entropy_coeff. For a few documentation page where the reference model performance and commands are provided, we modify the doc so that the experiment result is consistent with the config setup.

Usage Example

To enable entropy loss coefficient, use

actor_rollout_ref.actor.entropy_coeff=0.001 # or other values

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc.

Additional Info.

Issue Number: Fixes issue # or discussion # if any.
Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none]
Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none]

Checklist Before Submitting

Read the Contribute Guide.
Apply pre-commit checks.
Add [BREAKING] to the PR title if it breaks any API.
Update the documentation about your changes in the docs.
Add CI test(s) if necessary.

…ntropy_coeff to 0

…olcengine#1770) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? entropy_coeff shall be set carefully during RL. When enabled, inappropriate coefficient may case training to collapse. You can see more empirical experiments from Skywork Open Reasoner 1 Technical Report (https://arxiv.org/pdf/2505.22312). In this PR, the default value of entropy_coeff is set to 0. This is a breaking change that may affect your experiment, although majority of verl example scripts set it to 0 manually already. We let most example script just pick up the default value of 0 for entropy_coeff. For a few documentation page where the reference model performance and commands are provided, we modify the doc so that the experiment result is consistent with the config setup. ### Usage Example To enable entropy loss coefficient, use ```bash actor_rollout_ref.actor.entropy_coeff=0.001 # or other values ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.

eric-haibin-lin and others added 2 commits May 29, 2025 22:35

[BREAKING] config: set the default value of actor_rollout_ref.actor.e…

114016f

…ntropy_coeff to 0

Update README.md

db0bef6

vermouth1992 approved these changes May 30, 2025

View reviewed changes

vermouth1992 merged commit 2aed8d0 into volcengine:main May 30, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BREAKING] config: set the default value of actor.entropy_coeff to 0 #1770

[BREAKING] config: set the default value of actor.entropy_coeff to 0 #1770

Uh oh!

eric-haibin-lin commented May 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[BREAKING] config: set the default value of actor.entropy_coeff to 0 #1770

[BREAKING] config: set the default value of actor.entropy_coeff to 0 #1770

Uh oh!

Conversation

eric-haibin-lin commented May 30, 2025

Checklist Before Starting

What does this PR do?

Usage Example

Test

Additional Info.

Checklist Before Submitting

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants