Skip to content
View zigzagcai's full-sized avatar
🎯
Evolving
🎯
Evolving
  • Shanghai, China
  • 23:23 (UTC +08:00)

Block or report zigzagcai

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
zigzagcai/README.md

Hi there 😄

Short Bio

I am Zheng Cai, nickname zigzagcai, an AI Infra Engineer and Lifelong Learner.

I have general interest in (M)LLM pre/post-train and love to share my thoughts via blogs on zhihu: 由A800平台训练InternLM-7B无法收敛引发的思考, 支持变长序列的Mamba-1训练.

🥑 For now, I have personal interest in Agentic RL and Inference-Time Scaling, and believe it will bring new paradiam shift.

🍓 For AI, I believe that more is different and intelligence emerges from complexity, and like the ideas behind The Bitter Lesson.

🍒 For Infra, I love to build practical distributed systems that orchestrate computation/communication/caching to scale up and scale out better, and believe in the ideas behind The Hardware Lottery.

So, what I try to do is to build a bridge between various accelerators and large models, with the hope of achieving efficient system-model co-design in the new AI paradiam (Self-Evolving Agentic AI Systems).

My Thinking

I love the general idea of open source (code/knowledge/and others) and love to learn from open source community and try my best to contribute back.

Selected thoughts I have ever shared or developed:

  1. CPU memory optimization when using PyTorch Dataloader over very large-scale datasets: pytorch/pytorch#13246 (comment)
  2. Analysis of numerical stability between Ring and Tree All-Reduce: NVIDIA/nccl#1055
  3. Implement variable-length training with Mamba State Space Models: state-spaces/mamba#244
  4. Avoid deadlock when training with ColossalAI over very large-scale GPU clusters: hpcaitech/ColossalAI#5625
  5. DeepSeek V3 671B trainable with FSDP+EP by hacking two lines of PyTorch FSDP codes: https://github.com/zigzagcai/DeepSeekV3
  6. Support nogil feature in NumPy-1.18.5 in the experimental CPython ecosystem: https://github.com/colesbury/numpy/commit/0d6ef2770268711ee6417792ba0da35fcb264bf5

Pinned Loading

  1. DeepSeekV3 DeepSeekV3 Public

    Simple and efficient implementation of 671B DeepSeek V3 that trainable with FSDP+EP and minimal requirement of 256x A100/H100, targeted for HuggingFace ecosystem

    Python 6 1

  2. algorithmicsuperintelligence/openevolve algorithmicsuperintelligence/openevolve Public

    Open-source implementation of AlphaEvolve

    Python 5k 773

  3. InternLM/InternEvo InternLM/InternEvo Public

    InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.

    Python 416 69

  4. hpcaitech/ColossalAI hpcaitech/ColossalAI Public

    Making large AI models cheaper, faster and more accessible

    Python 41.3k 4.5k

  5. varlen_mamba varlen_mamba Public

    Forked from state-spaces/mamba

    Mamba SSM architecture that supports training on variable-length sequences

    Python 12 1

  6. state-spaces/mamba state-spaces/mamba Public

    Mamba SSM architecture

    Python 16.9k 1.6k