Stars
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
Algorithm and data structure articles for https://cp-algorithms.com (based on http://e-maxx.ru)
Code samples for C++ graduate course (iLab at MIPT)
This repository contains companion software for the Colfax Research paper "Categorical Foundations for CuTe Layouts".
A collection of CUDA programming examples to learn GPU programming
SGLang is a high-performance serving framework for large language models and multimodal models.
flash attention tutorial written in python, triton, cuda, cutlass
Additional completion definitions for Zsh.
My learning notes for ML SYS.
A flexible and efficient implementation of Flash Attention 2.0 for JAX, supporting multiple backends (GPU/TPU/CPU) and platforms (Triton/Pallas/JAX).
Rubik's cube solver written in python 3 for the console
Use QLoRA to tune LLM in PyTorch-Lightning w/ Huggingface + MLflow
Augmentex — a library for augmenting texts with errors


