Skip to content

[RFC] Moving Most TRL Trainers to the experimental Submodule to Streamline the Core #4223

@qgallouedec

Description

@qgallouedec

Context

TRL currently includes 15 trainers, which vary significantly in usage and maintenance requirements.

Image

Key points:

  1. Maintaining all trainers in their current state imposes a high maintenance cost and reduces our ability to quickly respond to user feature requests.
  2. As we prepare the library for V1, we need to clearly define which trainers are considered stable versus experimental, with the latter receiving less/no attention from maintainers.
  3. Some trainers have required bug fixes and refactoring for a while but have received limited attention due to low usage. Currently, these trainers remain in the codebase but are largely ignored in practice. While this works for rapid alpha development, it is not sustainable for a stable V1 release.

Proposal

We suggest a radical approach: remove most of the trainers from the main codebase.

Some may still live in trl.experimental module based on feedback from the community.

EDIT: We will first move most trainers in trl.experimental, and discuss later which one should be promoted, which one should stay in experimental, and which one should be removed.

Goal

  • Reduce maintenance overhead
  • Allow maintainers to focus on the stable core of the library and new features
  • Provide a clearer distinction between stable and experimental trainers
  • Simplify the codebase

Questions / Discussion

  • Which trainers do you think should remain in the stable codebase?
  • Any other suggestions or concerns regarding this plan?

Current plan (edited based on feedback)

Trainer Plan for v1 Ready for v1? Plan for after v1 (to be discussed later)
BCO 🧪 Moving to trl.experimental N/A May be removed
CPO 🧪 Moving to trl.experimental N/A May be removed
DPO ✅ Stay in trl ❌ Requires refactoring
Online DPO 🧪 Moving to trl.experimental N/A May be removed
GKD 🧪 Moving to trl.experimental N/A May be removed
GRPO ✅ Stay in trl ✅ Yes
KTO 🧪 Moving to trl.experimental N/A May be later promoted to main codebase after refactoring
Nash-MD 🧪 Moving to trl.experimental N/A May be removed
ORPO 🧪 Moving to trl.experimental N/A May stay in trl.experimental
PPO 🧪 Moving to trl.experimental N/A May stay in trl.experimental because it's an important baseline but requires heavy refactor
PRM 🧪 Moving to trl.experimental N/A May be removed
Reward ✅ Yes ✅ Yes
RLOO 🧪 Moving to trl.experimental N/A May stay in trl.experimental as maintenance cost is low
SFT ✅ Ready for v1 ✅ Yes
XPO 🧪 Moving to trl.experimental N/A May be removed

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions