[doc] misc: fix doc that penalty starts when exceeds the `max_response_length - overlong_buffer.len` #3856

bzantium · 2025-10-22T04:28:53Z

What does this PR do?

This PR corrects a minor typo in the documentation for the DAPO algorithm.

It changes the threshold for the overlong_buffer penalty from starting at max_response_length to max_response_length - overlong_buffer.len. This ensures the documentation accurately reflects that the penalty is applied as the response length approaches the maximum limit.

Fixes #3855

Signed-off-by: bzantium <[email protected]>

gemini-code-assist

Code Review

This pull request correctly fixes a typo in the documentation for the DAPO algorithm's overlong_buffer penalty, aligning the description with the implementation. I have added a couple of suggestions to rephrase the updated documentation in both files to further improve clarity for users.

gemini-code-assist · 2025-10-22T04:30:20Z

docs/algo/dapo.md

 Setting `overlong_buffer.enable` to `True` will penalize the outputs whose lengths are overlong but still within the hard context limit.

-Specifically, the penalty increases linearly from `0` to `overlong_buffer.penalty_factor` when the length of the output exceeds the `max_response_length` by `0` to `overlong_buffer.len` tokens.
+Specifically, the penalty increases linearly from `0` to `overlong_buffer.penalty_factor` when the length of the output exceeds the `max_response_length - overlong_buffer.len` by `0` to `overlong_buffer.len` tokens.


While this change correctly adjusts the threshold, the phrasing 'exceeds... by...' can be slightly confusing. For improved clarity, I suggest rephrasing to describe the range of the penalty application more directly.

Suggested change

Specifically, the penalty increases linearly from `0` to `overlong_buffer.penalty_factor` when the length of the output exceeds the `max_response_length - overlong_buffer.len` by `0` to `overlong_buffer.len` tokens.

Specifically, the penalty increases linearly from `0` to `overlong_buffer.penalty_factor` as the output length increases from `max_response_length - overlong_buffer.len` to `max_response_length`

gemini-code-assist · 2025-10-22T04:30:20Z

recipe/dapo/README.md

 Setting `overlong_buffer.enable` to `True` will penalize the outputs whose lengths are overlong but still within the hard context limit.

-Specifically, the penalty increases linearly from `0` to `overlong_buffer.penalty_factor` when the length of the output exceeds the `max_response_length` by `0` to `overlong_buffer.len` tokens.
+Specifically, the penalty increases linearly from `0` to `overlong_buffer.penalty_factor` when the length of the output exceeds the `max_response_length - overlong_buffer.len` by `0` to `overlong_buffer.len` tokens.


While this change correctly adjusts the threshold, the phrasing 'exceeds... by...' can be slightly confusing. For improved clarity, I suggest rephrasing to describe the range of the penalty application more directly.

Suggested change

Specifically, the penalty increases linearly from `0` to `overlong_buffer.penalty_factor` when the length of the output exceeds the `max_response_length - overlong_buffer.len` by `0` to `overlong_buffer.len` tokens.

Specifically, the penalty increases linearly from `0` to `overlong_buffer.penalty_factor` as the output length increases from `max_response_length - overlong_buffer.len` to `max_response_length`

…e_length - overlong_buffer.len` (volcengine#3856) ### What does this PR do? This PR corrects a minor typo in the documentation for the DAPO algorithm. It changes the threshold for the `overlong_buffer` penalty from starting at `max_response_length` to `max_response_length - overlong_buffer.len`. This ensures the documentation accurately reflects that the penalty is applied as the response length approaches the maximum limit. Fixes volcengine#3855 Signed-off-by: bzantium <[email protected]>

fix doc that penalty starts when exceeds the

82defef

Signed-off-by: bzantium <[email protected]>

bzantium requested review from FightingZhen, PeterSH6, eric-haibin-lin, ji-huazhong, tongyx361, vermouth1992 and zhaochenyang20 as code owners October 22, 2025 04:28

gemini-code-assist bot reviewed Oct 22, 2025

View reviewed changes

wuxibin89 approved these changes Oct 22, 2025

View reviewed changes

wuxibin89 changed the title ~~fix doc that penalty starts when exceeds the max_response_length - overlong_buffer.len~~ [doc] misc: fix doc that penalty starts when exceeds the max_response_length - overlong_buffer.len Oct 22, 2025

wuxibin89 merged commit 8b7a48d into volcengine:main Oct 22, 2025
3 of 6 checks passed

bzantium deleted the feature/#3855 branch October 22, 2025 13:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[doc] misc: fix doc that penalty starts when exceeds the `max_response_length - overlong_buffer.len` #3856

[doc] misc: fix doc that penalty starts when exceeds the `max_response_length - overlong_buffer.len` #3856

Uh oh!

bzantium commented Oct 22, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 22, 2025

Uh oh!

gemini-code-assist bot Oct 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	Specifically, the penalty increases linearly from `0` to `overlong_buffer.penalty_factor` when the length of the output exceeds the `max_response_length - overlong_buffer.len` by `0` to `overlong_buffer.len` tokens.
	Specifically, the penalty increases linearly from `0` to `overlong_buffer.penalty_factor` as the output length increases from `max_response_length - overlong_buffer.len` to `max_response_length`

[doc] misc: fix doc that penalty starts when exceeds the max_response_length - overlong_buffer.len #3856

[doc] misc: fix doc that penalty starts when exceeds the max_response_length - overlong_buffer.len #3856

Uh oh!

Conversation

bzantium commented Oct 22, 2025

What does this PR do?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[doc] misc: fix doc that penalty starts when exceeds the `max_response_length - overlong_buffer.len` #3856

[doc] misc: fix doc that penalty starts when exceeds the `max_response_length - overlong_buffer.len` #3856