[RFC] Moving Most TRL Trainers to the experimental Submodule to Streamline the Core

Context

TRL currently includes 15 trainers, which vary significantly in usage and maintenance requirements.

Key points:

Maintaining all trainers in their current state imposes a high maintenance cost and reduces our ability to quickly respond to user feature requests.
As we prepare the library for V1, we need to clearly define which trainers are considered stable versus experimental, with the latter receiving less/no attention from maintainers.
Some trainers have required bug fixes and refactoring for a while but have received limited attention due to low usage. Currently, these trainers remain in the codebase but are largely ignored in practice. While this works for rapid alpha development, it is not sustainable for a stable V1 release.

Proposal

We suggest a radical approach: remove most of the trainers from the main codebase.

~~Some may still live in trl.experimental module based on feedback from the community.~~

EDIT: We will first move most trainers in trl.experimental, and discuss later which one should be promoted, which one should stay in experimental, and which one should be removed.

Goal

Reduce maintenance overhead
Allow maintainers to focus on the stable core of the library and new features
Provide a clearer distinction between stable and experimental trainers
Simplify the codebase

Questions / Discussion

Which trainers do you think should remain in the stable codebase?
Any other suggestions or concerns regarding this plan?

Current plan (edited based on feedback)

Trainer	Plan for v1	Ready for v1?	Plan for after v1 (to be discussed later)
BCO	🧪 Moving to `trl.experimental`	N/A	May be removed
CPO	🧪 Moving to `trl.experimental`	N/A	May be removed
DPO	✅ Stay in `trl`	❌ Requires refactoring
Online DPO	🧪 Moving to `trl.experimental`	N/A	May be removed
GKD	🧪 Moving to `trl.experimental`	N/A	May be removed
GRPO	✅ Stay in `trl`	✅ Yes
KTO	🧪 Moving to `trl.experimental`	N/A	May be later promoted to main codebase after refactoring
Nash-MD	🧪 Moving to `trl.experimental`	N/A	May be removed
ORPO	🧪 Moving to `trl.experimental`	N/A	May stay in `trl.experimental`
PPO	🧪 Moving to `trl.experimental`	N/A	May stay in `trl.experimental` because it's an important baseline but requires heavy refactor
PRM	🧪 Moving to `trl.experimental`	N/A	May be removed
Reward	✅ Yes	✅ Yes
RLOO	🧪 Moving to `trl.experimental`	N/A	May stay in `trl.experimental` as maintenance cost is low
SFT	✅ Ready for v1	✅ Yes
XPO	🧪 Moving to `trl.experimental`	N/A	May be removed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] Moving Most TRL Trainers to the experimental Submodule to Streamline the Core #4223

Context

Proposal

Goal

Questions / Discussion

Current plan (edited based on feedback)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] Moving Most TRL Trainers to the experimental Submodule to Streamline the Core #4223

Description

Context

Proposal

Goal

Questions / Discussion

Current plan (edited based on feedback)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions