Skip to content
View jianzhnie's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report jianzhnie

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
jianzhnie/README.md

Hi there, I'm Robin 👋

jianzhnie's GitHub Streak


Welcome 👋

Hey, I'm jianzhnie. Thanks for stopping by!

I'm an AI engineer focusing on LLMs, RLHF, Reinforcement Learning, and production-grade code.

What I’m working on 🔭

Large Language Models

Code Repo About
LLMReasoning Techniques and toolkit for reasoning with LLMs.
LLMEval A modular framework to evaluate LLMs across tasks and settings.
LLMToolkit A PyTorch toolkit for NLP and LLM development.
LLamaTuner Easy and efficient finetuning pipelines for LLMs.
Open-R1 Open-source DeepSeek-R1-style and RLHF training pipeline.
awesome-instruction-datasets Curated instruction/prompt datasets for training ChatLLMs.

Reinforcement Learning

Code Repo About
Deep-RL-Toolkit Single-agent RL toolkit (DQN, Rainbow, DDPG, PPO, SAC, TD3, …).
Deep-MARL-Toolkit Multi-agent RL toolkit (VDN, QMIX, MADDPG, MAPPO, …).
RLZero MCTS for general sequential decision making (AlphaZero, MuZero, …).
ScaleRL Simple, scalable distributed RL (A3C, Ape-X, IMPALA, …).
CyberAttackSimulator RL environment for autonomous cyber attack and defense on simulated networks.

Others

  • Diffuser Toolkit for image/audio generation in PyTorch: diffusion-toolkit
  • AutoML for deep learning and tabular tasks: AutoTimm | AutoTabular
  • Trying to reduce the Learning Machine Learning (LML) loss 😂
  • Coding every day to become a better research engineer

I’m currently learning 🌱

  • RL for Reasoning and GRPO
  • LLM systems and AGI
  • Large-scale distributed RL systems

How to reach me 📫

Have an awesome day!

Pinned Loading

  1. LLamaTuner LLamaTuner Public

    Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.

    Python 616 65

  2. Open-R1 Open-R1 Public

    The open source implementation of DeepSeek-R1. 开源复现 DeepSeek-R1

    Python 272 52

  3. deep-marl-toolkit deep-marl-toolkit Public

    MARLToolkit: The Multi-Agent Rainforcement Learning Toolkit. Include implementation of MAPPO, MADDPG, QMIX, VDN, COMA, IPPO, QTRAN, MAT...

    Python 146 20

  4. deep-rl-toolkit deep-rl-toolkit Public

    RLToolkit is a flexible and high-efficient reinforcement learning framework. Include implementation of DQN, AC,A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....

    Python 9 2

  5. LLMToolkit LLMToolkit Public

    LLMToolkit is a toolkit for NLP(Natural Language Processing) and LLM(Large Language Models) using Pytorch.

    Python 6 2

  6. llmtech llmtech Public

    LLMTechSite, 专注于通用人工智能领域的技术生态。

    Python 10 4