Skip to content
View Weilun-Hub's full-sized avatar

Block or report Weilun-Hub

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Popular repositories Loading

  1. CPM.cu CPM.cu Public

    Forked from OpenBMB/CPM.cu

    CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge techniques in sparse architecture, speculative sampling and qua…

    Cuda

  2. mlx mlx Public

    Forked from ml-explore/mlx

    MLX: An array framework for Apple silicon

    C++

  3. mlx-lm mlx-lm Public

    Forked from ml-explore/mlx-lm

    Run LLMs with MLX

    Python

  4. cuda_hgemm cuda_hgemm Public

    Forked from Bruce-Lee-LY/cuda_hgemm

    Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

    Cuda

  5. LeetCUDA LeetCUDA Public

    Forked from xlite-dev/LeetCUDA

    📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

    Cuda

  6. sgl-flash-attn sgl-flash-attn Public

    Forked from sgl-project/sgl-flash-attn

    Fast and memory-efficient exact attention

    Python