Skip to content
View simveit's full-sized avatar

Block or report simveit

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Learn CUDA with PyTorch

Cuda 191 26 Updated Feb 1, 2026

CUDA Kernel Benchmarking Library

Cuda 806 100 Updated Jan 30, 2026

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 9,537 940 Updated Jan 18, 2026

Learn to use CMake

Python 97 29 Updated Jan 28, 2026

CUTLASS and CuTe Examples

Cuda 117 14 Updated Nov 30, 2025

Algorithm and data structure articles for https://cp-algorithms.com (based on http://e-maxx.ru)

C++ 10,096 1,981 Updated Jan 31, 2026

A Python way do decode SASS

Cuda 5 1 Updated Jan 18, 2026

Code samples for C++ graduate course (iLab at MIPT)

C++ 221 39 Updated Nov 1, 2025

This repository contains companion software for the Colfax Research paper "Categorical Foundations for CuTe Layouts".

Python 97 4 Updated Sep 24, 2025

A collection of CUDA programming examples to learn GPU programming

Cuda 54 15 Updated Oct 12, 2025
Python 39 3 Updated Dec 14, 2025
Python 3 Updated Sep 5, 2025

NVidia sass disassembler/inline patcher

C++ 40 3 Updated Feb 1, 2026

Fastest kernels written from scratch

Cuda 530 64 Updated Sep 18, 2025

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 23,032 4,275 Updated Feb 1, 2026

flash attention tutorial written in python, triton, cuda, cutlass

Cuda 484 52 Updated Jan 20, 2026

Additional completion definitions for Zsh.

Shell 7,624 730 Updated Jan 27, 2026

My learning notes for ML SYS.

Python 5,234 337 Updated Jan 30, 2026

A flexible and efficient implementation of Flash Attention 2.0 for JAX, supporting multiple backends (GPU/TPU/CPU) and platforms (Triton/Pallas/JAX).

Python 34 1 Updated Mar 4, 2025

personal repo for doing stuff with maxtext

Python 1 Updated Apr 30, 2024
Python 558 57 Updated Jul 11, 2024

Material for gpu-mode lectures

Jupyter Notebook 5,659 568 Updated Feb 1, 2026

Rubik's cube solver written in python 3 for the console

Python 34 17 Updated Nov 27, 2022

Use QLoRA to tune LLM in PyTorch-Lightning w/ Huggingface + MLflow

Python 64 8 Updated Nov 15, 2023

Augmentex — a library for augmenting texts with errors

Python 70 Updated Jul 3, 2024

Python bindings for llama.cpp

Python 9,941 1,289 Updated Aug 15, 2025