huggingface / trl Public

generated from fastai/nbdev_template

Notifications You must be signed in to change notification settings
Fork 2.4k
Star 16.8k

Code
Issues 539
Pull requests 88
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Pull requests: huggingface/trl

Labels 34 Milestones 0

New pull request New

88 Open 2,397 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[GOLD] add probability merging fix to implement chain rule

#4765 opened Dec 30, 2025 by kashif

Loading…

5 tasks

ORPO: Avoid catastrophic cancellation in loss function

#4763 opened Dec 29, 2025 by hartmans

Loading…

3 tasks

Sudoku GRPO example script using TextArena

#4762 opened Dec 29, 2025 by sergiopaniego

Loading…

5 tasks

Add a config to limit the number of tool calling iterations.

#4761 opened Dec 29, 2025 by pramodith

Loading…

4 of 5 tasks

[Docs] Add SRL (Supervised Reinforcement Learning) to Community Tutorials

#4758 opened Dec 29, 2025 by s23deepak

Loading…

2 tasks done

Extend CLI to orpo trainer

#4757 opened Dec 27, 2025 by murilo-cunha

Loading…

3 of 5 tasks

fix: handle None eval_dataset in example code

#4756 opened Dec 27, 2025 by ciaoyizhen

Loading…

1 of 4 tasks

perf: avoid output_hidden_states when only last_hidden_state is used

#4755 opened Dec 27, 2025 by ciaoyizhen

Loading…

2 of 5 tasks

vllm parameter passthrough for stop sequences

#4754 opened Dec 26, 2025 by kdubovikov

Loading…

Fix GRPO scale_rewards type specification to fix __post_init__ validation

#4752 opened Dec 26, 2025 by apalmas-saifh

Loading…

1 of 5 tasks

Clarify Accelerate usage in SFTTrainer documentation

#4744 opened Dec 23, 2025 by Likhita-17

Loading…

1 task done

fix minillm trainer

#4743 opened Dec 23, 2025 by t1101675

Loading…

5 tasks

[GRPOTrainer]: Agent Training Supports Async Tool Calls

#4742 opened Dec 23, 2025 by pramodith

Loading…

5 tasks done

[WIP - Awaiting Feedback] feat: Bidirectional masked importance sampling ratio (MIS) for IcePop

#4732 opened Dec 20, 2025 by casinca • Draft

5 tasks

Fix MiniLLM Training

#4731 opened Dec 20, 2025 by t1101675

Loading…

Up to 50% less VRAM during forward with forward_masked_logits function

#4729 opened Dec 20, 2025 by qgallouedec

Loading…

Improve PEFT integration

#4723 opened Dec 19, 2025 by qgallouedec

Loading…

Refactor vLLM generation [2/N]: Decouple rollout_func and vLLM functionalities

#4712 opened Dec 17, 2025 by albertvillanova • Draft

Refactor vLLM generation [1/N]: Extract vLLM generation

#4700 opened Dec 16, 2025 by albertvillanova

Loading…

fix: invalidate ZeRO-3 param coordinator trace in add_hooks

#4693 opened Dec 15, 2025 by roycho96

Loading…

1 of 5 tasks

feat: DeepSeek V3.2 Off-policy sequence masking

#4689 opened Dec 13, 2025 by casinca

Loading…

4 of 5 tasks

GKDTrainer: Fix return_outputs in Liger kernel path and update tests

#4688 opened Dec 13, 2025 by roycho96

Loading…

2 of 5 tasks

Update import structure

#4665 opened Dec 11, 2025 by qgallouedec

Loading…

[WIP] GRPO-inspired Online DPO refactor

#4659 opened Dec 10, 2025 by d-tiapkin • Draft

2 of 7 tasks

feature: Add RTPO Trainer

#4652 opened Dec 9, 2025 by SolarWindRider

Loading…

6 tasks done

Previous 1 2 3 4 Next

Previous Next

ProTip! Type g p on any issue or pull request to go back to the pull request listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!