generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Open
Labels
✨ enhancementNew feature or requestNew feature or request🏋 DPORelated to DPORelated to DPO🏋 GKDRelated to GKDRelated to GKD🏋 GRPORelated to GRPORelated to GRPO🏋 KTORelated to KTORelated to KTO🏋 ORPORelated to ORPORelated to ORPO🏋 Online DPORelated to Online DPORelated to Online DPO🏋 PPORelated to PPORelated to PPO🏋 PRMRelated to PRMRelated to PRM🏋 RLOORelated to RLOORelated to RLOO🏋 RewardRelated to Reward modellingRelated to Reward modelling🏋 SFTRelated to SFTRelated to SFT🏋 XPORelated to XPORelated to XPO🙋 help from community wantedOpen invitation for community members to contributeOpen invitation for community members to contribute
Description
Context
TRL currently includes 15 trainers, which vary significantly in usage and maintenance requirements.
Key points:
- Maintaining all trainers in their current state imposes a high maintenance cost and reduces our ability to quickly respond to user feature requests.
- As we prepare the library for V1, we need to clearly define which trainers are considered stable versus experimental, with the latter receiving less/no attention from maintainers.
- Some trainers have required bug fixes and refactoring for a while but have received limited attention due to low usage. Currently, these trainers remain in the codebase but are largely ignored in practice. While this works for rapid alpha development, it is not sustainable for a stable V1 release.
Proposal
We suggest a radical approach: remove most of the trainers from the main codebase.
Some may still live in trl.experimental module based on feedback from the community.
EDIT: We will first move most trainers in trl.experimental, and discuss later which one should be promoted, which one should stay in experimental, and which one should be removed.
Goal
- Reduce maintenance overhead
- Allow maintainers to focus on the stable core of the library and new features
- Provide a clearer distinction between stable and experimental trainers
- Simplify the codebase
Questions / Discussion
- Which trainers do you think should remain in the stable codebase?
- Any other suggestions or concerns regarding this plan?
Current plan (edited based on feedback)
| Trainer | Plan for v1 | Ready for v1? | Plan for after v1 (to be discussed later) |
|---|---|---|---|
| BCO | 🧪 Moving to trl.experimental |
N/A | May be removed |
| CPO | 🧪 Moving to trl.experimental |
N/A | May be removed |
| DPO | ✅ Stay in trl |
❌ Requires refactoring | |
| Online DPO | 🧪 Moving to trl.experimental |
N/A | May be removed |
| GKD | 🧪 Moving to trl.experimental |
N/A | May be removed |
| GRPO | ✅ Stay in trl |
✅ Yes | |
| KTO | 🧪 Moving to trl.experimental |
N/A | May be later promoted to main codebase after refactoring |
| Nash-MD | 🧪 Moving to trl.experimental |
N/A | May be removed |
| ORPO | 🧪 Moving to trl.experimental |
N/A | May stay in trl.experimental |
| PPO | 🧪 Moving to trl.experimental |
N/A | May stay in trl.experimental because it's an important baseline but requires heavy refactor |
| PRM | 🧪 Moving to trl.experimental |
N/A | May be removed |
| Reward | ✅ Yes | ✅ Yes | |
| RLOO | 🧪 Moving to trl.experimental |
N/A | May stay in trl.experimental as maintenance cost is low |
| SFT | ✅ Ready for v1 | ✅ Yes | |
| XPO | 🧪 Moving to trl.experimental |
N/A | May be removed |
perinmclaughlin, PocketDocLabs, kristaller486 and KlausikPLfizzAI and graftimsergiopaniego, m3at, LeonMalteW, lewtun, matdmiller and 1 more
Metadata
Metadata
Assignees
Labels
✨ enhancementNew feature or requestNew feature or request🏋 DPORelated to DPORelated to DPO🏋 GKDRelated to GKDRelated to GKD🏋 GRPORelated to GRPORelated to GRPO🏋 KTORelated to KTORelated to KTO🏋 ORPORelated to ORPORelated to ORPO🏋 Online DPORelated to Online DPORelated to Online DPO🏋 PPORelated to PPORelated to PPO🏋 PRMRelated to PRMRelated to PRM🏋 RLOORelated to RLOORelated to RLOO🏋 RewardRelated to Reward modellingRelated to Reward modelling🏋 SFTRelated to SFTRelated to SFT🏋 XPORelated to XPORelated to XPO🙋 help from community wantedOpen invitation for community members to contributeOpen invitation for community members to contribute