Fix #66: 在RL训练的时候开启test_during_training=True会导致大量no valid token的wanring#69
Merged
AkaliKong merged 1 commit intoAkaliKong:mainfrom Mar 31, 2026
Conversation
Add top_k=None and top_p=None to test_generation_config so it matches the SFT evaluate config and avoids spurious sampling filters that conflict with ConstrainedLogitsProcessor. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Owner
|
Thank you for your contribution! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #66
Adds
top_k=Noneandtop_p=Noneto thegenerate()call insideReReTrainerduring test-time inference, eliminating spurious "no valid token" warnings whentest_during_training=Trueis used in RL training runs.Changes
minionerec_trainer.py— InReReTrainer, themodel.generate()invocation used for evaluation (around line 576) now explicitly passestop_k=Noneandtop_p=Nonealongside the existingdo_sample=False,num_beams, and token ID arguments. Without these, the generation config inherited residual sampling filter values that conflicted with beam search, triggering the warnings.tests/test_minimax_provider.py— AddsTestTestGenerationConfigHasNoTopKTopP.test_trainer_source_sets_top_k_and_top_p_none, a regression test that parsesminionerec_trainer.pyviaast.walk, locates thetest_generation_configassignment, and asserts bothtop_k=Noneandtop_p=Noneare present as keyword arguments withNoneconstant values.Motivation
When
test_during_training=Trueis enabled during RL training, the trainer callsmodel.generate()withdo_sample=Falsebut without explicitly clearingtop_k/top_p. If the model's generation config carries non-Nonevalues for those parameters, the nucleus/top-k filters can exclude all candidate tokens during beam search steps, producing a flood of "no valid token" warnings. The SFT evaluation path was already unaffected because its generation config was configured separately; this fix brings the RL training evaluation path into parity.Testing
The new
TestTestGenerationConfigHasNoTopKTopPtest statically verifies the fix cannot regress without the test failing. Running the suite:Manual verification: launching RL training with
test_during_training=Trueno longer emits "no valid token" warnings in the evaluation loop.