Skip to content

Conversation

@hhaAndroid
Copy link
Collaborator

@hhaAndroid hhaAndroid commented Dec 23, 2025

支持 RL Resume

用法 1-常规用法

trainer = RLTrainerConfig(
    hf_interval=5,
    hf_max_keep=1,
    checkpoint_interval=2,
    checkpoint_maxkeep=2,
    auto_resume=True
)

会自动保存 dcp 和 model 以及 hf 权重、dataload 相关状态。 auto_resume 可以确保自动 resume 所有状态

用法 2-不保存 dcp

load_checkpoint_cfg = LoadCheckpointConfig(load_optimizer_states=False, load_optimizer_args=False)
trainer = RLTrainerConfig(
    hf_interval=5,
    hf_max_keep=1,
    checkpoint_interval=2,
    checkpoint_maxkeep=2,
    checkpoint_no_save_optimizer=True,
    load_checkpoint_cfg=load_checkpoint_cfg,
    auto_resume=True
)

除了 optimizer 不恢复外其余全恢复。

注意:

  • 在 resume 模式下 checkpoint_interval 必须要设置,默认是 -1 表示不支持任何 resume
  • 暂时没有考虑 partial rollout 以及 ReplayBufferStorage 状态的恢复
@hhaAndroid hhaAndroid changed the title [WIP] Resume RL Dec 23, 2025
@hhaAndroid hhaAndroid requested a review from jayhenry December 24, 2025 06:51
@hhaAndroid hhaAndroid merged commit 0ab6bf8 into InternLM:main Dec 24, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants