Skip to content

Conversation

@bzantium
Copy link
Contributor

What does this PR do?

This PR corrects a minor typo in the documentation for the DAPO algorithm.

It changes the threshold for the overlong_buffer penalty from starting at max_response_length to max_response_length - overlong_buffer.len. This ensures the documentation accurately reflects that the penalty is applied as the response length approaches the maximum limit.

Fixes #3855

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a typo in the documentation for the DAPO algorithm's overlong_buffer penalty, aligning the description with the implementation. I have added a couple of suggestions to rephrase the updated documentation in both files to further improve clarity for users.

Setting `overlong_buffer.enable` to `True` will penalize the outputs whose lengths are overlong but still within the hard context limit.

Specifically, the penalty increases linearly from `0` to `overlong_buffer.penalty_factor` when the length of the output exceeds the `max_response_length` by `0` to `overlong_buffer.len` tokens.
Specifically, the penalty increases linearly from `0` to `overlong_buffer.penalty_factor` when the length of the output exceeds the `max_response_length - overlong_buffer.len` by `0` to `overlong_buffer.len` tokens.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While this change correctly adjusts the threshold, the phrasing 'exceeds... by...' can be slightly confusing. For improved clarity, I suggest rephrasing to describe the range of the penalty application more directly.

Suggested change
Specifically, the penalty increases linearly from `0` to `overlong_buffer.penalty_factor` when the length of the output exceeds the `max_response_length - overlong_buffer.len` by `0` to `overlong_buffer.len` tokens.
Specifically, the penalty increases linearly from `0` to `overlong_buffer.penalty_factor` as the output length increases from `max_response_length - overlong_buffer.len` to `max_response_length`

Setting `overlong_buffer.enable` to `True` will penalize the outputs whose lengths are overlong but still within the hard context limit.

Specifically, the penalty increases linearly from `0` to `overlong_buffer.penalty_factor` when the length of the output exceeds the `max_response_length` by `0` to `overlong_buffer.len` tokens.
Specifically, the penalty increases linearly from `0` to `overlong_buffer.penalty_factor` when the length of the output exceeds the `max_response_length - overlong_buffer.len` by `0` to `overlong_buffer.len` tokens.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While this change correctly adjusts the threshold, the phrasing 'exceeds... by...' can be slightly confusing. For improved clarity, I suggest rephrasing to describe the range of the penalty application more directly.

Suggested change
Specifically, the penalty increases linearly from `0` to `overlong_buffer.penalty_factor` when the length of the output exceeds the `max_response_length - overlong_buffer.len` by `0` to `overlong_buffer.len` tokens.
Specifically, the penalty increases linearly from `0` to `overlong_buffer.penalty_factor` as the output length increases from `max_response_length - overlong_buffer.len` to `max_response_length`

@wuxibin89 wuxibin89 changed the title fix doc that penalty starts when exceeds the max_response_length - overlong_buffer.len [doc] misc: fix doc that penalty starts when exceeds the max_response_length - overlong_buffer.len Oct 22, 2025
@wuxibin89 wuxibin89 merged commit 8b7a48d into volcengine:main Oct 22, 2025
3 of 6 checks passed
@bzantium bzantium deleted the feature/#3855 branch October 22, 2025 13:39
sunnweiwei pushed a commit to sunnweiwei/verl that referenced this pull request Oct 23, 2025
…e_length - overlong_buffer.len` (volcengine#3856)

### What does this PR do?

This PR corrects a minor typo in the documentation for the DAPO
algorithm.

It changes the threshold for the `overlong_buffer` penalty from starting
at `max_response_length` to `max_response_length - overlong_buffer.len`.
This ensures the documentation accurately reflects that the penalty is
applied as the response length approaches the maximum limit.

Fixes volcengine#3855

Signed-off-by: bzantium <[email protected]>
wangboxiong320 pushed a commit to wangboxiong320/verl that referenced this pull request Nov 1, 2025
…e_length - overlong_buffer.len` (volcengine#3856)

### What does this PR do?

This PR corrects a minor typo in the documentation for the DAPO
algorithm.

It changes the threshold for the `overlong_buffer` penalty from starting
at `max_response_length` to `max_response_length - overlong_buffer.len`.
This ensures the documentation accurately reflects that the penalty is
applied as the response length approaches the maximum limit.

Fixes volcengine#3855

Signed-off-by: bzantium <[email protected]>
NenoL2001 pushed a commit to NenoL2001/verl that referenced this pull request Nov 3, 2025
…e_length - overlong_buffer.len` (volcengine#3856)

### What does this PR do?

This PR corrects a minor typo in the documentation for the DAPO
algorithm.

It changes the threshold for the `overlong_buffer` penalty from starting
at `max_response_length` to `max_response_length - overlong_buffer.len`.
This ensures the documentation accurately reflects that the penalty is
applied as the response length approaches the maximum limit.

Fixes volcengine#3855

Signed-off-by: bzantium <[email protected]>
AlexJJ009 pushed a commit to AlexJJ009/verl that referenced this pull request Nov 5, 2025
…e_length - overlong_buffer.len` (volcengine#3856)

### What does this PR do?

This PR corrects a minor typo in the documentation for the DAPO
algorithm.

It changes the threshold for the `overlong_buffer` penalty from starting
at `max_response_length` to `max_response_length - overlong_buffer.len`.
This ensures the documentation accurately reflects that the penalty is
applied as the response length approaches the maximum limit.

Fixes volcengine#3855

Signed-off-by: bzantium <[email protected]>
chenjiaoAngel added a commit to chenjiaoAngel/verl that referenced this pull request Nov 14, 2025
…e_length - overlong_buffer.len` (volcengine#3856)

### What does this PR do?

This PR corrects a minor typo in the documentation for the DAPO
algorithm.

It changes the threshold for the `overlong_buffer` penalty from starting
at `max_response_length` to `max_response_length - overlong_buffer.len`.
This ensures the documentation accurately reflects that the penalty is
applied as the response length approaches the maximum limit.

Fixes volcengine#3855

Signed-off-by: bzantium <[email protected]>
chenhaiq pushed a commit to The-Hierophant/verl-1 that referenced this pull request Nov 18, 2025
…e_length - overlong_buffer.len` (volcengine#3856)

### What does this PR do?

This PR corrects a minor typo in the documentation for the DAPO
algorithm.

It changes the threshold for the `overlong_buffer` penalty from starting
at `max_response_length` to `max_response_length - overlong_buffer.len`.
This ensures the documentation accurately reflects that the penalty is
applied as the response length approaches the maximum limit.

Fixes volcengine#3855

Signed-off-by: bzantium <[email protected]>
NenoL2001 pushed a commit to NenoL2001/verl that referenced this pull request Nov 26, 2025
…e_length - overlong_buffer.len` (volcengine#3856)

### What does this PR do?

This PR corrects a minor typo in the documentation for the DAPO
algorithm.

It changes the threshold for the `overlong_buffer` penalty from starting
at `max_response_length` to `max_response_length - overlong_buffer.len`.
This ensures the documentation accurately reflects that the penalty is
applied as the response length approaches the maximum limit.

Fixes volcengine#3855

Signed-off-by: bzantium <[email protected]>
TimurTaepov pushed a commit to giorgossideris/verl that referenced this pull request Dec 20, 2025
…e_length - overlong_buffer.len` (volcengine#3856)

### What does this PR do?

This PR corrects a minor typo in the documentation for the DAPO
algorithm.

It changes the threshold for the `overlong_buffer` penalty from starting
at `max_response_length` to `max_response_length - overlong_buffer.len`.
This ensures the documentation accurately reflects that the penalty is
applied as the response length approaches the maximum limit.

Fixes volcengine#3855

Signed-off-by: bzantium <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[docs]: Correction in DAPO Penalty Calculation Description

2 participants