NVIDIA Blackwell Sets New STAC-AI Record for LLM Inference

This title was summarized by AI from the post below.

View organization page for NVIDIA AI Infrastructure

282,326 followers

📣 NVIDIA Blackwell sets a new STAC-AI LANG6 record for LLM inference in quantitative research and algorithmic trading, delivering the highest compute-per-watt and lowest token cost. We tested Llama 3.1 8B and 70B with NVIDIA TensorRT-LLM across multiple NVIDIA platforms. Systems tested: ✅ NVIDIA HGX B200 on Lambda ✅ NVIDIA RTX PRO 6000 Blackwell Server Edition system from Supermicro ✅ NVIDIA Grace Hopper-based server from Hewlett Packard Enterprise See the results 👉 https://nvda.ws/4fFM5ww

2 Comments

Supermicro 2d

Proud to set new records together with NVIDIA! 🏆NVIDIA AI Infrastructure

1 Reaction

Raghavan Kasturirangan 2d

great technical blog. u can connect the developed systems to external APIs

See more comments

To view or add a comment, sign in

More Relevant Posts

Panchaea Ltd

771 followers
2w
Report this post
The road to a quantum computer depends on enormous amounts of compute, creating a classic bottleneck 💻 Thankfully, the NVIDIA RTX PRO 6000 Blackwell Workstation Edition is designed to support high-intensity simulation and AI workloads. Learn more 🔗: https://lnkd.in/ezZnU9kB
Like Comment
To view or add a comment, sign in
Jack Duhamel
3w
Report this post
30 new H100 nodes just hit GPU Mart $183,000 per unit 8x NVIDIA H100 80GB SXM5, brand new who wants these things?

10 Comments
Like Comment
To view or add a comment, sign in
QCT

13,919 followers
5d
Report this post
Meet the #QuantaGrid D75E-4U, a single platform that delivers endless possibilities. It’s not only an NVIDIA RTX PRO server for building enterprise-grade AI factories, but also allows for flexible PCIe GPU configurations to accelerate AI and HPC workloads of all sizes. #AIFactory
Like Comment
To view or add a comment, sign in
Fahd Mirza
2w
Report this post
⚡ Luce Megakernel just proved the NVIDIA efficiency gap is a software problem not a hardware one 🔬 a 2020 RTX 3090 at 220W now matches Apple M5 Max efficiency and delivers 1.8x the throughput 🔹 413 tok/s decode vs 267 tok/s on llama.cpp — same GPU, different software 🔹 1.87 tok/J — matching Apple M5 Max at less than a third of the system cost 🔹 All 24 layers of Qwen3.5-0.8B fused into a single CUDA kernel — zero CPU round trips 🔹 25x faster than PyTorch HuggingFace on the same hardware 🔹 Hybrid DeltaNet and Attention architecture — the first megakernel ever built for this pattern 🔥 Full breakdown and live benchmark below 👇 https://lnkd.in/gq5ChYzH

1 Comment
Like Comment
To view or add a comment, sign in
Spectral Compute

1,743 followers
3w
Report this post
𝗧𝗵𝗲 𝗱𝗲𝗺𝗮𝗻𝗱 𝗳𝗼𝗿 𝗰𝗼𝗺𝗽𝘂𝘁𝗲 𝗶𝘀 𝗼𝘂𝘁𝗽𝗮𝗰𝗶𝗻𝗴 𝗮𝗻𝘆 𝘀𝗶𝗻𝗴𝗹𝗲 𝘃𝗲𝗻𝗱𝗼𝗿'𝘀 𝘀𝘂𝗽𝗽𝗹𝘆. Mixed GPU fleets are becoming the norm. Your toolchain should handle that. 𝗦𝗖𝗔𝗟𝗘 compiles CUDA codebases for: • NVIDIA 𝘀𝗺_𝟮𝟬 to 𝘀𝗺_𝟭𝟮𝟬 (Quadro Plex 7000 up through Blackwell & RTX 5090) • AMD 𝗚𝗖𝗡 𝟱–𝗖𝗗𝗡𝗔 𝟰 (MI25 up through MI355X, including architectures ROCm no longer officially supports) • AMD 𝗥𝗗𝗡𝗔 𝟭–𝟰 (including RX 9070) 𝘐𝘧 𝘺𝘰𝘶𝘳 𝘤𝘰𝘥𝘦 𝘸𝘰𝘳𝘬𝘴 𝘪𝘯 𝘊𝘜𝘋𝘈, 𝘪𝘵 𝘤𝘰𝘮𝘱𝘪𝘭𝘦𝘴 𝘪𝘯 𝘚𝘊𝘈𝘓𝘌. Old silicon, new silicon, anything in between.
1 Comment
Like Comment
To view or add a comment, sign in
Crypto.com

779,346 followers
1w
Report this post
🚨 Some passwords can now be cracked instantly. According to a 2025 research table from Hive Systems, advances in GPU power mean weak passwords are easier than ever to brute force — especially short or reused ones. The takeaway is simple: length and complexity matter. A long, unique password with numbers, uppercase letters, and symbols can dramatically increase the time needed to crack it. Swipe through to see how long your password might actually last 👀 Note: All estimated cracking times were calculated using 12× NVIDIA RTX 5090 GPUs and bcrypt hashing (10 rounds).

1 Comment
Like Comment
To view or add a comment, sign in
Dmitry Tweepsmap Test
4w
Report this post
MSI (re)launches $85,000 Nvidia DGX Station workstation with the Nvidia GB300 Ultra, a pair of 400GbE LAN ports, and 768GB of RAM https://qc.twp.ai/WqP1TG
Like Comment
To view or add a comment, sign in
Shawn Chauhan
1w
Report this post
NVIDIA's real moat is not the GPU. It is the product cadence. By the time a competitor ships hardware that matches Blackwell, NVIDIA has already moved to Vera Rubin. By the time Vera Rubin is matched, Rubin Ultra ships. One-year cycles do not just win on performance benchmarks. They make the competitive timeline structurally impossible - because catching up and staying caught up are two different problems. Four million CUDA developers and a proprietary interconnect system are the walls. The annual cadence is the moat that keeps refilling itself.
1 Comment
Like Comment
To view or add a comment, sign in
Safi Ullah
2w Edited
Report this post
Anthropic just made uninterrupted development a lot easier. Recently, Anthropic signed a deal with xAI to use their Collossus 1 infrastructure which adds over 220,000 NVIDIA GPUs to their setup. This allowed them to double the 5 hour limits for every paid tier. This combined with their new task routing in Code Kit v5.3, the output becomes 4x. They removed the rate limiting in high traffic times too. Making development a lot easier.
Like Comment
To view or add a comment, sign in
Vincent Van Steenbergen
4w Edited
Report this post
Thanks to NVIDIA #NIM you can now try DeepSeek-v4-Pro directly in #DeepBrain or through their website/API. Davide Schiavon Dimitri Ababii Benoit Gougeon Aâdel B. Alexis Gendronneau

NVIDIA AI

1,837,355 followers
1mo

Happy Friday! We just put DeepSeek-V4-Pro up on build.nvidia.com. It’s the world’s largest open source model at 1.6T parameters, and you can run it for free running on NVIDIA Blackwell GPUs. Try the NVIDIA NIM API → https://lnkd.in/ghygMYQ4
Like Comment
To view or add a comment, sign in

282,326 followers

View Profile Connect

NVIDIA Blackwell Sets New STAC-AI Record for LLM Inference

More from this author

Learn to Build AI-Ready Networks with NVIDIA Training

NVIDIA AI Enterprise Expands to Include NVIDIA Omniverse and NVIDIA Run:ai Resource Optimization Tools

Explore content categories