Fix #66: 在RL训练的时候开启test_during_training=True会导致大量no valid token的wanring by JiwaniZakir · Pull Request #69 · AkaliKong/MiniOneRec

JiwaniZakir · 2026-03-30T17:39:32Z

Closes #66

Adds top_k=None and top_p=None to the generate() call inside ReReTrainer during test-time inference, eliminating spurious "no valid token" warnings when test_during_training=True is used in RL training runs.

Changes

minionerec_trainer.py — In ReReTrainer, the model.generate() invocation used for evaluation (around line 576) now explicitly passes top_k=None and top_p=None alongside the existing do_sample=False, num_beams, and token ID arguments. Without these, the generation config inherited residual sampling filter values that conflicted with beam search, triggering the warnings.

tests/test_minimax_provider.py — Adds TestTestGenerationConfigHasNoTopKTopP.test_trainer_source_sets_top_k_and_top_p_none, a regression test that parses minionerec_trainer.py via ast.walk, locates the test_generation_config assignment, and asserts both top_k=None and top_p=None are present as keyword arguments with None constant values.

Motivation

When test_during_training=True is enabled during RL training, the trainer calls model.generate() with do_sample=False but without explicitly clearing top_k/top_p. If the model's generation config carries non-None values for those parameters, the nucleus/top-k filters can exclude all candidate tokens during beam search steps, producing a flood of "no valid token" warnings. The SFT evaluation path was already unaffected because its generation config was configured separately; this fix brings the RL training evaluation path into parity.

Testing

The new TestTestGenerationConfigHasNoTopKTopP test statically verifies the fix cannot regress without the test failing. Running the suite:

python -m pytest tests/test_minimax_provider.py::TestTestGenerationConfigHasNoTopKTopP -v
# PASSED — test_trainer_source_sets_top_k_and_top_p_none

Manual verification: launching RL training with test_during_training=True no longer emits "no valid token" warnings in the evaluation loop.

Add top_k=None and top_p=None to test_generation_config so it matches the SFT evaluate config and avoids spurious sampling filters that conflict with ConstrainedLogitsProcessor. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

AkaliKong · 2026-03-31T07:05:53Z

Thank you for your contribution!

AkaliKong closed this Mar 31, 2026

AkaliKong reopened this Mar 31, 2026

AkaliKong merged commit 7892d6b into AkaliKong:main Mar 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #66: 在RL训练的时候开启test_during_training=True会导致大量no valid token的wanring#69

Fix #66: 在RL训练的时候开启test_during_training=True会导致大量no valid token的wanring#69
AkaliKong merged 1 commit intoAkaliKong:mainfrom
JiwaniZakir:fix/66-rl-test-during-training-true-no-valid-t

JiwaniZakir commented Mar 30, 2026

Uh oh!

AkaliKong commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JiwaniZakir commented Mar 30, 2026

Changes

Motivation

Testing

Uh oh!

AkaliKong commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants