👋 About Me

Hi! I am an imcoming PhD student at HKUST advised by Prof. Ling Pan. I am also a final-year master’s student at Tsinghua University under the supervision of Prof. Xiu Li. I received my bachelor’s degree with honors from Shandong University in June 2023.

I am currently a research intern at StepFun. Previously, I interned at Kuaishou (working with Jiakang Wang and Dr. Fuzheng Zhang), Shanghai AI Laboratory (working with Dr. Biqing Qi and Dr. Chenjia Bai), and Peking University (working with Prof. Yali Du and Prof. Yaodong Yang).

Research Interests: My research centers around Large Language Models (LLMs) and Reinforcement Learning (RL). Specifically, I am interested in:

  • Reasoning: Enhancing the reasoning capabilities of LLMs and Multi-modal LLMs (MLLMs).
  • Agents: Long-horizon planning agents & self-evolving agents.

If you are interested in collaboration, please feel free to reach out via e-mail!

🌟 News

  • [2025.11]  🎉 One paper accepted by AAAI 2026
  • [2025.09]  🎉 One paper accepted by NeurIPS 2025
  • [2025.09]  🔥 Preprint A Survey of Reinforcement Learning for Large Reasoning Models released at arXiv
  • [2025.08]  🎉 Two papers accepted by EMNLP 2025
  • [2025.05]  🔥 Our multi-agent RL framework for LLM reasoning released (GitHub)!
  • [2025.03]  🎉 One paper accepted by Reasoning and Planning for LLMs Workshop @ ICLR 2025
  • [2025.01]  🎉 One paper accepted by ICLR 2025
  • [2024.12]  🎉 One paper accepted by AAAI 2025 and selected for oral presentation (Top 4.6%)
  • [2024.05]  🎉 One paper accepted by ICML 2024
  • [2024.01]  🎉 One paper accepted by ICLR 2024
  • [2022.09]  🎉 One paper accepted by NeurIPS 2022

📝 Papers

(* denotes equal contribution, denotes project lead)

Preprints

  • A Survey of Reinforcement Learning for Large Reasoning Models
    Kaiyan Zhang*†, Yuxin Zuo*†, Bingxiang He*, Youbang Sun*, Runze Liu* (* denotes core contribution), Che Jiang*, Yuchen Fan*, Kai Tian*, Guoli Jia*, Pengfei Li*, Yu Fu*, Xingtai Lv*, Yuchen Zhang*, Sihang Zeng*, Shang Qu*, Haozhan Li*, Shijie Wang*, Yuru Wang*, Xinwei Long, Fangfu Liu, Xiang Xu, Jiaze Ma, Xuekai Zhu, Ermo Hua, Yihao Liu, Zonglin Li, Huayu Chen, Xiaoye Qu, Yafu Li, Weize Chen, Zhenzhao Yuan, Junqi Gao, Dong Li, Zhiyuan Ma, Ganqu Cui, Zhiyuan Liu, Biqing Qi, Ning Ding, Bowen Zhou
    [GitHub 2.1k+ Stars] [HuggingFace Daily Papers Top 1] [Synced (机器之心)]
    Preprint, 2025

Publications

🎓 Education

🎖 Honors and Awards

  • National Scholarship (Top 0.2%), 2022.12
  • National Scholarship (Top 0.2%), 2021.12
  • First Prize in China Undergraduate Mathematical Contest in Modeling (CUMCM) (Top 0.65%), 2021.11
  • Outstanding Student of Shandong Province (Top 0.6%), 2022.05
  • Outstanding Graduate of Shandong Province (Top 5%), 2023.04
  • Dishang Scholarship, 2022.10

💻 Internships

🎙 Invited Talks

  • Scaling Test-Time Compute of LLMs and PRMs for Mathematical Reasoning. ASAP Seminar. 2025.06.
  • Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling. Huawei Noah’s Ark Lab. 2025.03.
  • Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling. Xiaohongshu. 2025.02.

🛠️ Services

  • Conference Reviewer: NeurIPS (2024 - 2025), ICLR (2025 - 2026), ICML (2025), AAAI (2026), AAMAS (2024), AISTATS (2025), ECAI (2024)
  • Journal Reviewer: IEEE Transactions on Artificial Intelligence (TAI)
  • Workshop Reviewer: NeurIPS OTML (2023)