blackwell

Star

Here are 59 public repositories matching this topic...

vllm-project / vllm

Sponsor

Star

A high-throughput and memory-efficient inference and serving engine for LLMs

Updated Jul 1, 2026
Python

sgl-project / sglang

Star

SGLang is a high-performance serving framework for large language models and multimodal models.

reinforcement-learning cuda inference transformer moe attention llama glm minimax wan diffusion vlm blackwell llm qwen deepseek gpt-oss qwen-image

Updated Jul 1, 2026
Python

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

cuda pytorch moe blackwell llm-serving

Updated Jul 1, 2026
Python

lightseekorg / tokenspeed

Sponsor

Star

TokenSpeed is a speed-of-light LLM inference engine.

glm minimax vlm kimi blackwell llm qwen speed-of-light deepseek tokenspeed gpt-oss lightseek

Updated Jul 1, 2026
Python

GradientHQ / parallax

Star

Parallax is a distributed model serving framework that lets you build your own AI cluster anywhere

python distributed-systems chatbot pytorch transformer llama glm minimax kimi blackwell large-language-models llm llm-serving qwen deepseek oss-gpt decentralized-inference

Updated Jun 26, 2026
Python

NVIDIA / cudnn-frontend

Star

cuDNN Frontend is NVIDIA's modern, open-source entry point to the cuDNN library and a growing collection of high-performance open-source kernels.

Updated Jun 30, 2026
Python

AEON-7 / Qwen3.6-27B-AEON-Ultimate-Uncensored-DFlash

Star

Fully uncensored, capability-enhanced abliteration of Qwen3.6-27B. NVFP4 + z-lab DFlash speculative decoding (n=12) on the unified ghcr.io/aeon-7/aeon-vllm-ultimate:latest container, tuned for long-context draft acceptance on DGX Spark. 6 HF variants (BF16/NVFP4/MTP/MTP-XS), docker-compose, and QuickStart.

quantization uncensored blackwell llm vllm qwen speculative-decoding abliteration qwen3 nvfp4 dgx-spark dflash

Updated Jun 28, 2026
Python

patrick-toulme / pyptx

Star

A Python DSL to write Nvidia PTX for Hopper and Blackwell in JAX and PyTorch

pytorch nvidia hopper nvidia-gpu ptx jax blackwell

Updated May 8, 2026
Python

AEON-7 / vllm-dflash

Star

DFlash vLLM for DGX Spark — Plug & Play Block-Diffusion Speculative Decoding

docker inference nvidia blackwell llm vllm qwen speculative-decoding block-diffusion nvfp4 dgx-spark dflash

Updated Jun 28, 2026
Python

Mekopa / whisperx-blackwell

Star

GPU-accelerated WhisperX on NVIDIA Blackwell (SM_121) - DGX Spark compatible

audio docker machine-learning deep-learning gpu cuda pytorch nvidia speech-recognition transcription asr speaker-diarization dgx blackwell pyannote whisperx dgx-spark sm-121

Updated Apr 23, 2026
Python

egaoharu-kensei / flash-attention-triton

Star

Cross-platform FlashAttention-2 Triton implementation for Turing+ GPUs with custom configuration mode

Updated Jan 12, 2026
Python

MerkyorLynn / lynn-engine

Star

Lynn 原生 LLM 推理引擎 · W4A8/NVFP4 量化 · 自写 CUDA/Triton kernel · MoE · 投机解码 | Lynn-native LLM inference engine for NVIDIA Blackwell

Updated Jun 8, 2026
Python

waybarrios / dgx-spark-finetune-llm

Star

LLM fine-tuning with LoRA + NVFP4/MXFP8 on NVIDIA DGX Spark (Blackwell GB10)

deep-learning pytorch nvidia lora quantization fine-tuning blackwell llm nvfp4 dgx-spark transformer-engine mxfp8

Updated Dec 22, 2025
Python

lna-lab / blackwell-geforce-nvfp4-gemm

Star

NVFP4 inference on Blackwell GeForce (RTX 5090/5080/5070 Ti/RTX PRO 6000) — SM120 patches for vLLM + FlashInfer + CUTLASS. 175 tok/s on Qwen3.6-35B MoE.

gpu-computing quantization cutlass gemm geforce blackwell vllm llm-inference flashinfer rtx-5090 sm120 nvfp4

Updated Apr 27, 2026
Python

Sggin1 / DGX-SPARK

Star

DGX Spark research and tests - containers, benchmarks, and investigation notes for running models on GB10 (SM 12.1)

aarch64 blackwell kv-cache vllm nvfp4 dgx-spark mamba-ssm sm121 turboquant

Updated Jun 6, 2026
Python

lna-lab / gemma4-12b-vllm-sm120

Star

Reproducible recipe: serve abliterated Gemma-4-12B (gemma4_unified) at 50-118 tok/s on no-NVLink Blackwell (SM120) via vLLM nightly + ModelOpt FP8/NVFP4 + MTP spec-decode.

quantization gemma blackwell fp8 vllm speculative-decoding sm120 nvfp4 abliterated gemma-4 modelopt tensor-parallel

Updated Jun 7, 2026
Python

CosmicRaisins / glm-5.2-gb10

Star

GLM-5.2 (744B/40B MoE) on a 4× DGX Spark / GB10 (sm_121) cluster: portable Triton sparse-MLA kernels, a data-free expert prune, MTP draft, and a one-script bootstrap.

moe glm blackwell vllm speculative-decoding gb10 dgx-spark

Updated Jun 29, 2026
Python

dataforgex / dgx_spark

Star

Multi-model LLM serving for NVIDIA DGX Spark with vLLM, web UI, and tool calling

ocr chatbot blackwell openai-api private-search-engine vllm private-llm dgx-spark

Updated Jan 24, 2026
Python

albond / DGX_Spark_Unsloth_Lossless_Speedup

Star

7.67× LoRA / 8.35× Full FT speedup for Qwen3.5 (0.8B–27B) on NVIDIA DGX Spark — wall-clock parity with rented H100. Lossless within BF16. Three-command interactive wizard handles model picker, data validator, training, and merge.

cuda transformers pytorch nvidia triton lora fine-tuning peft multimodal blackwell qwen unsloth gb10 dgx-spark qwen3-5 sm121

Updated May 19, 2026
Python

AEON-7 / supergemma4-26b-abliterated-multimodal-nvfp4

Star

NVFP4 AWQ Full quantization of SuperGemma4-26B-Abliterated-Multimodal for Blackwell GPUs — pre-built vLLM container + patches included

moe quantization multimodal blackwell awq llm vllm nvfp4 dgx-spark gemma4 modelopt

Updated May 1, 2026
Python

Improve this page

Add a description, image, and links to the blackwell topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the blackwell topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

blackwell

Here are 59 public repositories matching this topic...

vllm-project / vllm

sgl-project / sglang

NVIDIA / TensorRT-LLM

lightseekorg / tokenspeed

GradientHQ / parallax

NVIDIA / cudnn-frontend

AEON-7 / Qwen3.6-27B-AEON-Ultimate-Uncensored-DFlash

patrick-toulme / pyptx

AEON-7 / vllm-dflash

Mekopa / whisperx-blackwell

egaoharu-kensei / flash-attention-triton

MerkyorLynn / lynn-engine

waybarrios / dgx-spark-finetune-llm

lna-lab / blackwell-geforce-nvfp4-gemm

Sggin1 / DGX-SPARK

lna-lab / gemma4-12b-vllm-sm120

CosmicRaisins / glm-5.2-gb10

dataforgex / dgx_spark

albond / DGX_Spark_Unsloth_Lossless_Speedup

AEON-7 / supergemma4-26b-abliterated-multimodal-nvfp4

Improve this page

Add this topic to your repo