-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Description
Reminder
- I have read the above rules and searched the existing issues.
System Info
[INFO|2025-12-27 15:11:11] llamafactory.model.model_utils.kv_cache:143 >> KV cache is disabled during training.
[2025-12-27 15:11:12,692] [INFO] [config.py:735:init] Config mesh_device None world_size = 4
[2025-12-27 15:11:12,870] [INFO] [config.py:735:init] Config mesh_device None world_size = 4
[2025-12-27 15:11:12,892] [INFO] [config.py:735:init] Config mesh_device None world_size = 4
[2025-12-27 15:11:12,972] [INFO] [config.py:735:init] Config mesh_device None world_size = 4
[2025-12-27 15:11:52,868] [INFO] [partition_parameters.py:348:exit] finished initializing model - num_params = 74391, num_elems = 79.67B
[INFO|2025-12-27 15:23:18] llamafactory.model.model_utils.checkpointing:143 >> Gradient checkpointing enabled.
[INFO|2025-12-27 15:23:18] llamafactory.model.model_utils.attention:143 >> Using FlashAttention-2 for faster training and inference.
[INFO|2025-12-27 15:23:18] llamafactory.model.adapter:143 >> DeepSpeed ZeRO3 detected, remaining trainable params in float32.
[INFO|2025-12-27 15:23:18] llamafactory.model.adapter:143 >> Fine-tuning method: LoRA
[INFO|2025-12-27 15:23:18] llamafactory.model.model_utils.misc:143 >> Found linear modules: gate_proj,o_proj,in_proj_ba,gate,up_proj,k_proj,down_proj,shared_expert_gate,v_proj,q_proj,in_proj_qkvz,out_proj
[INFO|2025-12-27 15:25:35] llamafactory.model.loader:143 >> trainable params: 6,092,957,184 || all params: 85,767,348,480 || trainable%: 7.1041
Reproduction
CPU GPU似乎都在运行,但是终端或者Web都没有显示训练进度,训练似乎卡住了? 一整晚 都没反应似乎
Others
No response