Skip to content

Pull requests: huggingface/trl

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

[GOLD] add probability merging fix to implement chain rule
#4765 opened Dec 30, 2025 by kashif Loading…
5 tasks
ORPO: Avoid catastrophic cancellation in loss function
#4763 opened Dec 29, 2025 by hartmans Loading…
3 tasks
Sudoku GRPO example script using TextArena
#4762 opened Dec 29, 2025 by sergiopaniego Loading…
5 tasks
Add a config to limit the number of tool calling iterations.
#4761 opened Dec 29, 2025 by pramodith Loading…
4 of 5 tasks
[Docs] Add SRL (Supervised Reinforcement Learning) to Community Tutorials
#4758 opened Dec 29, 2025 by s23deepak Loading…
2 tasks done
Extend CLI to orpo trainer
#4757 opened Dec 27, 2025 by murilo-cunha Loading…
3 of 5 tasks
fix: handle None eval_dataset in example code
#4756 opened Dec 27, 2025 by ciaoyizhen Loading…
1 of 4 tasks
perf: avoid output_hidden_states when only last_hidden_state is used
#4755 opened Dec 27, 2025 by ciaoyizhen Loading…
2 of 5 tasks
vllm parameter passthrough for stop sequences
#4754 opened Dec 26, 2025 by kdubovikov Loading…
Clarify Accelerate usage in SFTTrainer documentation
#4744 opened Dec 23, 2025 by Likhita-17 Loading…
1 task done
fix minillm trainer
#4743 opened Dec 23, 2025 by t1101675 Loading…
5 tasks
[GRPOTrainer]: Agent Training Supports Async Tool Calls
#4742 opened Dec 23, 2025 by pramodith Loading…
5 tasks done
Fix MiniLLM Training
#4731 opened Dec 20, 2025 by t1101675 Loading…
Improve PEFT integration
#4723 opened Dec 19, 2025 by qgallouedec Loading…
fix: invalidate ZeRO-3 param coordinator trace in add_hooks
#4693 opened Dec 15, 2025 by roycho96 Loading…
1 of 5 tasks
feat: DeepSeek V3.2 Off-policy sequence masking
#4689 opened Dec 13, 2025 by casinca Loading…
4 of 5 tasks
GKDTrainer: Fix return_outputs in Liger kernel path and update tests
#4688 opened Dec 13, 2025 by roycho96 Loading…
2 of 5 tasks
Update import structure
#4665 opened Dec 11, 2025 by qgallouedec Loading…
[WIP] GRPO-inspired Online DPO refactor
#4659 opened Dec 10, 2025 by d-tiapkin Draft
2 of 7 tasks
feature: Add RTPO Trainer
#4652 opened Dec 9, 2025 by SolarWindRider Loading…
6 tasks done
ProTip! Type g p on any issue or pull request to go back to the pull request listing page.