Skip to content

Fix #66: 在RL训练的时候开启test_during_training=True会导致大量no valid token的wanring#69

Merged
AkaliKong merged 1 commit intoAkaliKong:mainfrom
JiwaniZakir:fix/66-rl-test-during-training-true-no-valid-t
Mar 31, 2026
Merged

Fix #66: 在RL训练的时候开启test_during_training=True会导致大量no valid token的wanring#69
AkaliKong merged 1 commit intoAkaliKong:mainfrom
JiwaniZakir:fix/66-rl-test-during-training-true-no-valid-t

Conversation

@JiwaniZakir
Copy link
Copy Markdown
Contributor

Closes #66

Adds top_k=None and top_p=None to the generate() call inside ReReTrainer during test-time inference, eliminating spurious "no valid token" warnings when test_during_training=True is used in RL training runs.

Changes

minionerec_trainer.py — In ReReTrainer, the model.generate() invocation used for evaluation (around line 576) now explicitly passes top_k=None and top_p=None alongside the existing do_sample=False, num_beams, and token ID arguments. Without these, the generation config inherited residual sampling filter values that conflicted with beam search, triggering the warnings.

tests/test_minimax_provider.py — Adds TestTestGenerationConfigHasNoTopKTopP.test_trainer_source_sets_top_k_and_top_p_none, a regression test that parses minionerec_trainer.py via ast.walk, locates the test_generation_config assignment, and asserts both top_k=None and top_p=None are present as keyword arguments with None constant values.

Motivation

When test_during_training=True is enabled during RL training, the trainer calls model.generate() with do_sample=False but without explicitly clearing top_k/top_p. If the model's generation config carries non-None values for those parameters, the nucleus/top-k filters can exclude all candidate tokens during beam search steps, producing a flood of "no valid token" warnings. The SFT evaluation path was already unaffected because its generation config was configured separately; this fix brings the RL training evaluation path into parity.

Testing

The new TestTestGenerationConfigHasNoTopKTopP test statically verifies the fix cannot regress without the test failing. Running the suite:

python -m pytest tests/test_minimax_provider.py::TestTestGenerationConfigHasNoTopKTopP -v
# PASSED — test_trainer_source_sets_top_k_and_top_p_none

Manual verification: launching RL training with test_during_training=True no longer emits "no valid token" warnings in the evaluation loop.

Add top_k=None and top_p=None to test_generation_config so it
matches the SFT evaluate config and avoids spurious sampling
filters that conflict with ConstrainedLogitsProcessor.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@AkaliKong
Copy link
Copy Markdown
Owner

Thank you for your contribution!

@AkaliKong AkaliKong closed this Mar 31, 2026
@AkaliKong AkaliKong reopened this Mar 31, 2026
@AkaliKong AkaliKong merged commit 7892d6b into AkaliKong:main Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

在RL训练的时候开启test_during_training=True会导致大量no valid token的wanring

2 participants