sm121

Star

Here are 11 public repositories matching this topic...

albond / DGX_Spark_Qwen3.5-122B-A10B-AR-INT4

Star

Qwen3.5-122B-A10B on DGX Spark: 28.3 → 51 tok/s (+80%)

cuda lossless mtp speedup performance-optimization vllm autoround dgx-spark qwen3-5 sm121 qwen3-5-122b-a10b

Updated Jun 2, 2026
Python

Sggin1 / DGX-SPARK

Star

DGX Spark research and tests - containers, benchmarks, and investigation notes for running models on GB10 (SM 12.1)

aarch64 blackwell kv-cache vllm nvfp4 dgx-spark mamba-ssm sm121 turboquant

Updated Jun 6, 2026
Python

albond / DGX_Spark_Unsloth_Lossless_Speedup

Star

7.67× LoRA / 8.35× Full FT speedup for Qwen3.5 (0.8B–27B) on NVIDIA DGX Spark — wall-clock parity with rented H100. Lossless within BF16. Three-command interactive wizard handles model picker, data validator, training, and merge.

cuda transformers pytorch nvidia triton lora fine-tuning peft multimodal blackwell qwen unsloth gb10 dgx-spark qwen3-5 sm121

Updated May 19, 2026
Python

Logos-Flux / optimized-CUDA-GB10

Star

Optimized CUDA kernels for NVIDIA GB10 Blackwell (sm_121, DGX Spark). RMSNorm + GELU. First sm_121 kernel on HuggingFace Kernel Hub.

gpu cuda pytorch nvidia kernels gelu huggingface blackwell rmsnorm gb10 dgx-spark sm121

Updated Jun 21, 2026
Cuda

drewid74 / optimized-qwen35-hybrid-v2-runbook-public

Star

Production runbook for Qwen3.5-122B hybrid INT4+FP8 on NVIDIA DGX Spark GB10 — optimization stack, PD firmware wedge diagnosis, bench results

Updated Jun 18, 2026

r0b0tlab / diffusiongemma-26b-nvfp4-sm121-vllm

Star

Optimized SM121 vLLM container and benchmark report for nvidia/diffusiongemma-26B-A4B-it-NVFP4

benchmark blackwell vllm gb10 nvfp4 sm121 diffusiongemma

Updated Jun 10, 2026
HTML

idonati / spark-vllm-docker-festr2

Star

Patches + recipe to deploy festr2/MiMo-V2.5-Pro-NVFP4-MXFP8-attn-TP8 on 8-node DGX Spark sm_121 (Ray + vLLM, TP=8). Fixes the fused-qkv loader bug that mis-slotted Q values as K/V on 7 of 8 ranks.

moe ray quantization mimo huggingface vllm gb10 nvfp4 dgx-spark mxfp8 sm121 tensor-parallel

Updated May 19, 2026
Python

parallelArchitect / gb10-kernel-probe

Star

Empirical kernel scheduling characterization for NVIDIA GB10 (SM121a). Sweeps GEMM tile configurations, classifies PTX instruction paths, captures hardware telemetry

benchmark gpu cuda nvidia empirical performance-analysis profiling cutlass gemm ptx black-box-testing unified-memory kernel-scheduling nvidia-tools gb10 dgx-spark sm121

Updated May 10, 2026
C++

ogulcanaydogan / dgx-spark-llm-stack

Star

Pre-built PyTorch wheels and build scripts for NVIDIA DGX Spark (GB10, sm_121, Blackwell, CUDA 13.0, ARM64)

machine-learning deep-learning gpu cuda inference pytorch nvidia arm64 aarch64 fine-tuning blackwell llm gb10 dgx-spark grace-blackwell sm121 cuda-13 pre-built-wheels

Updated Jun 25, 2026
Shell

leap21ai / autospark

Star

DGX Spark (GB10/SM121) platform support for Meta's KernelAgent — auto-detect, hardware constraints, safe Triton configs

cuda nvidia triton gpu-optimization gb10 dgx-spark sm121 kernel-agent

Updated Mar 14, 2026
Python

parallelArchitect / OpenMP_VV

Star

OpenMP Offloading Validation & Verification Suite; Official repository. We have migrated from bitbucket!! For documentation, results, publication and presentations, please check out our website ->

openmp compilers aarch64 nvcc blackwell unified-memory gb10 dgx-spark sm121 hardware-coherent-uma nvlink-c2c requires-unified-shared-memory

Updated Jun 15, 2026
C

Improve this page

Add a description, image, and links to the sm121 topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the sm121 topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sm121

Here are 11 public repositories matching this topic...

albond / DGX_Spark_Qwen3.5-122B-A10B-AR-INT4

Sggin1 / DGX-SPARK

albond / DGX_Spark_Unsloth_Lossless_Speedup

Logos-Flux / optimized-CUDA-GB10

drewid74 / optimized-qwen35-hybrid-v2-runbook-public

r0b0tlab / diffusiongemma-26b-nvfp4-sm121-vllm

idonati / spark-vllm-docker-festr2

parallelArchitect / gb10-kernel-probe

ogulcanaydogan / dgx-spark-llm-stack

leap21ai / autospark

parallelArchitect / OpenMP_VV

Improve this page

Add this topic to your repo