Feature/add remax support #234

liziniu · 2025-02-09T12:56:49Z

Description

Added ReMax support to verl. ReMax is a simple, efficient, and stable RL algorithm customized for LLM training, with theoretical guarantees for variance reduction.

The HybridFlow paper experimented with ReMax, but verl did not provide an implementation. Therefore, ReMax has been added.

Changes

Added RayReMaxTrainer implementation
Added example scripts for ReMax training
Added documentation for ReMax algorithm

Testing

Tested ReMax example scripts with Qwen models

validation reward of optimizing Qwen2.5-3B-Instruct on the GSM8K dataset:

The curve demonstrates the effectiveness of ReMax, though its performance can be further enhanced through hyperparameter fine-tuning.

Documentation

Added ReMax documentation
Updated example configurations

Checklist

Code follows project's style guidelines (yapf formatted)
Tests added/updated and passing
Documentation updated
Example scripts added

vermouth1992 · 2025-02-09T13:29:58Z

Hi @liziniu Thank you for your contribution! According to our implementation, I guess ReMax can be implemented by adding a few lines to the original PPO/GRPO/Reinforce implementation instead of writing a new trainer to make maintenance easier. Correct me if this is invalid

PeterSH6 · 2025-02-09T13:55:45Z

+1. From my understanding, Remax can be implemented similarly to reinforce++ with a different adv estimator. See reinforce++ implementation: #228

eric-haibin-lin · 2025-02-10T00:11:48Z

README.md

 - huggingface models support
 - Supervised fine-tuning
- Reinforcement learning from human feedback with [PPO](https://github.com/volcengine/verl/tree/main/examples/ppo_trainer) and [GRPO](https://github.com/volcengine/verl/tree/main/examples/grpo_trainer)
+- Reinforcement learning from human feedback with [PPO](https://github.com/volcengine/verl/tree/main/examples/ppo_trainer), [GRPO](https://github.com/volcengine/verl/tree/main/examples/grpo_trainer), and [ReMax](https://github.com/volcengine/verl/tree/main/examples/remax_trainer)


if you have the training log and wandb already, would you mind adding one more record to docs/experiment/ppo.rst to include remax? it would help the community to track if experiment can be reproduced in future version.
We can do that in the next PR

Yes. A preliminary result on Qwen2.5-3B is added and more results will come later.

liziniu · 2025-02-10T00:55:53Z

@vermouth1992 @PeterSH6 I see. Let me reformat the code with minimal changes of the PPO's trainer.

liziniu · 2025-02-10T04:27:12Z

Hi, @vermouth1992 @PeterSH6

I have completed the implementation of ReMaX support. The changes include:

Remove the new trainer for ReMax
Implemented Remax based on PPO's trainer
Updated the preliminary result in docs/experiment/ppo.rst

The code follows the project's style guidelines.

Please review when you have a chance. Let me know if any changes or clarifications are needed.

Thank you for your time!

verl/trainer/ppo/core_algos.py

vermouth1992 · 2025-02-10T09:16:14Z

Could you add a CI to run remax with Qwen 0.5b to protect this functionality? You can follow the example here: https://github.com/volcengine/verl/blob/main/.github/workflows/e2e_gsm8k.yml#L69

liziniu · 2025-02-10T14:19:51Z

Hi @vermouth1992

I’ve updated the end-to-end (e2e) test. Since this is my first time writing such a test within the workflows, I’m not entirely confident about it. Could you please review it when you have a moment?

Thank you!

## Description Added [ReMax](https://arxiv.org/abs/2310.10505) support to verl. ReMax is a simple, efficient, and stable RL algorithm customized for LLM training, with theoretical guarantees for variance reduction. The [HybridFlow](https://arxiv.org/pdf/2409.19256v2) paper experimented with ReMax, but verl did not provide an implementation. Therefore, ReMax has been added. ## Changes - Added RayReMaxTrainer implementation - Added example scripts for ReMax training - Added documentation for ReMax algorithm ## Testing - Tested ReMax example scripts with Qwen models validation reward of optimizing Qwen2.5-3B-Instruct on the GSM8K dataset: <img width="501" alt="截屏2025-02-09 20 51 14" src="https://github.com/user-attachments/assets/742c2eab-6877-4c3c-b0a2-4159bd109add" /> The curve demonstrates the effectiveness of ReMax, though its performance can be further enhanced through hyperparameter fine-tuning. ## Documentation - Added ReMax documentation - Updated example configurations ## Checklist - [x] Code follows project's style guidelines (yapf formatted) - [x] Tests added/updated and passing - [x] Documentation updated - [x] Example scripts added

liziniu added 2 commits February 9, 2025 12:38

feat: add ReMax support

57e3814

update README for remax

cf6402f

eric-haibin-lin reviewed Feb 10, 2025

View reviewed changes

liziniu added 2 commits February 10, 2025 01:58

reformat remax's code

81b22d7

remove unused file

3b5586c

PeterSH6 requested changes Feb 10, 2025

View reviewed changes

verl/trainer/ppo/core_algos.py Show resolved Hide resolved

liziniu added 2 commits February 10, 2025 04:59

fix: restore accidentally removed reinforce++ implementation

560e840

fix: update the info for 'compute_remax_outcome_advantage'

123da4a

PeterSH6 approved these changes Feb 10, 2025

View reviewed changes

liziniu added 2 commits February 10, 2025 14:02

update ReMax training result for Qwen2.5-7B

804ef97

add CI to run remax with Qwen 0.5b

90b0c3d

vermouth1992 merged commit 769b8d0 into volcengine:main Feb 10, 2025
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/add remax support #234

Feature/add remax support #234

Uh oh!

liziniu commented Feb 9, 2025

Uh oh!

vermouth1992 commented Feb 9, 2025 •

edited

Loading

Uh oh!

PeterSH6 commented Feb 9, 2025

Uh oh!

eric-haibin-lin Feb 10, 2025

Uh oh!

liziniu Feb 10, 2025

Uh oh!

liziniu commented Feb 10, 2025

Uh oh!

liziniu commented Feb 10, 2025

Uh oh!

Uh oh!

vermouth1992 commented Feb 10, 2025 •

edited

Loading

Uh oh!

liziniu commented Feb 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Feature/add remax support #234

Feature/add remax support #234

Uh oh!

Conversation

liziniu commented Feb 9, 2025

Description

Changes

Testing

Documentation

Checklist

Uh oh!

vermouth1992 commented Feb 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PeterSH6 commented Feb 9, 2025

Uh oh!

eric-haibin-lin Feb 10, 2025

Choose a reason for hiding this comment

Uh oh!

liziniu Feb 10, 2025

Choose a reason for hiding this comment

Uh oh!

liziniu commented Feb 10, 2025

Uh oh!

liziniu commented Feb 10, 2025

Uh oh!

Uh oh!

vermouth1992 commented Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

liziniu commented Feb 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vermouth1992 commented Feb 9, 2025 •

edited

Loading

vermouth1992 commented Feb 10, 2025 •

edited

Loading