AI/ML learner focusing on Large Language Models (LLMs).
Passionate about the full LLM lifecycle from pre-training & fine-tuning to alignment (RLHF) and inference optimization.
- LLM Alignment: RLHF, PPO, DPO & Retrieval-Augmented Generation (RAG)
- Learning: Agentic Workflows, DeepSpeed & model quantization (AWQ / GPTQ)
- Education: Qilu University of Technology (QLUT)
- Research Interests: Reward modeling, context window extension, chain-of-thought (CoT)
- Seeking 2026 Summer Internship opportunities
(LLM / Multimodal / Agent / RLHF)
- 大模型如何降低幻觉:从不可消除,到可工程化控制
- DreamGym:用经验合成而非真实交互,扩展大模型 Agent 的学习能力
- 我发现:大模型检索里,最划算的 Query Rewrite 不是“重写语义”,而是“只翻译”
- 模型自进化:个人记录
- Email: 867762462f@gmail.com
- Zhihu: 二次函数


