A defining moment for AI compute, in just 60 seconds. On stage at #ArmEverywhere, Rene Haas and Mohamed Awad were joined by Santosh Janardhan and Paul Saab from Meta, and Kevin Weil from OpenAI, to celebrate a major moment for the Arm compute platform—our expansion into production silicon with the Arm AGI CPU. Built for the rise of agentic AI, this is a big step—bringing Arm into data center silicon for the first time. https://okt.to/Rso6AN
More Relevant Posts
-
Why speculative decoding wants two kinds of silicon: https://lnkd.in/g27YASit This is a packed article which: - Systematically explores Spec Decoding, starting with what Google and DeepMind discovered in 2023. - How ByteDance’s SwiftSpec supercharged it by disaggregating the draft and verify stages - Why Blackwell/Rubin GPUs can be awkward draft engines, and why SRAM-heavy accelerators like d-Matrix and Groq can be a better fit - What the Gimlet Labs × d-Matrix partnership signals about heterogeneous deployments - A forward look at what I call “Dense Memory Accelerators,” like d-Matrix’s Raptor architecture with 3D-DRAM, and how this could pressure HBM-based GPU architectures - How NVLink Fusion will play a role in a heterogenous eco-systems - What all of this means for agentic AI workloads - And finally, how to think about SRAM vs 3D-DRAM vs HBM as a continuum of tradeoffs Hope you like it!
To view or add a comment, sign in
-
-
🚗 Self-driving cars are not just a software problem. 🤖 They are a massive compute problem. It’s also a reminder that when building infrastructure, you have to start with the workload you’re trying to run. Loved hearing our Denise Muyco explain what it takes to power autonomous AI.
🏗️ Real-time AI does not fail at the model layer. It fails at the infrastructure layer. - Latency tolerance is zero. - Power inefficiency compounds risk. - GPU density stresses thermal limits. Autonomous workloads expose misalignment immediately. RAVEL aligns power, compute, and orchestration before production begins. If you’re building AI at Scale, meet us at #GTC2026 to learn more. Schedule a meeting: https://lnkd.in/gti5WWbU #AIInfrastructure #AutonomousSystems #AIFactory #GTC2026 #NVIDIAGTC #AIInfrastructureStrategy
To view or add a comment, sign in
-
I recently attended the session “Accelerate AI Inference Using DOCA for Storage” by NVIDIA and it completely changed how I think about scaling AI systems. One major insight was that AI inference bottlenecks are shifting from compute to data movement and storage efficiency. Large language models generate massive KV-cache data, and recomputing context repeatedly leads to wasted GPU cycles and higher latency. NVIDIA introduced the concept of a new storage tier architecture (CMX) along with DOCA Memos, which enables efficient KV-cache sharing across GPUs. This helps improve GPU utilization, reduce latency, and scale real-time AI workloads more effectively. Another powerful takeaway was how technologies like RDMA, NIXL, and BlueField DPUs can accelerate the inference data path and enable infrastructure-level optimizations for modern AI applications. This session made me realize that the future of AI is not just about building better models but about designing end-to-end optimized AI systems from edge to cloud. Grateful to be part of NVIDIA GTC and excited to explore these concepts further through hands-on experimentation. Vikram Gaur #NVIDIAGTC #AIInfrastructure #DOCA #AIInference #GPUComputing #MachineLearning
To view or add a comment, sign in
-
-
Enjoyed a great conversation with JP-Jeffery Potvin, CEO of Arc Compute, at Rafay’s Executive Breakfast during NVIDIA GTC 2026. We talked about how AI infrastructure is moving from experimentation to production and why the winners will be those who can train, deploy and operate large models efficiently. It’s exciting to see platforms like Arc Compute and Rafay working together to turn GPU clusters into governed, multi‑tenant AI factories. I’m inspired by the vision of building scalable AI infrastructure that will define the next generation of computing. #NVIDIAGTC #ArcCompute #Rafay #AIInfrastructure #AIFactories
To view or add a comment, sign in
-
-
Petaflops don't mean much if you can't use them all. FPGAs can give you ASIC performance with GPU flexibility. But you need an FPGA designed from the ground-up for GenAI. Achronix devices have native support for quantized narrow number formats, perfectly balanced compute and memory bandwidth, and an on-chip NoC for moving all that data around. And we have outstanding partners to help make the dream a reality!
Great article by Elastix AI on FPGAs for LLM inference. Our Speedster7t Data Acceleration platform combines flexible data format matrix math engines, high bandwidth memories combined with high speed data transport. Driving down the cost of low latency LLM inference. https://lnkd.in/eUwDf8Jv
To view or add a comment, sign in
-
Today marks a new chapter for Arm - and for AI infrastructure. At #ArmEverywhere we announced that we are extending the Arm compute platform into production-ready silicon for the first time - starting with the Arm AGI CPU. Designed for AI data centers, it marks a major step forward for infrastructure in the era of agentic AI. As AI systems shift toward continuously running agents, the CPU becomes central to orchestrating workloads at scale. The Arm AGI CPU is built for this new class of infrastructure, delivering more than 2x performance per rack compared with x86 platforms. This work is grounded in close collaboration across the ecosystem, including with Meta as a lead partner, helping shape how this platform is deployed in real-world AI infrastructure. This is the next evolution of the Arm compute platform, defining the foundation for the next generation of AI-native data centers. Very proud that we finally get to share more about it. https://lnkd.in/giK8rYgW
To view or add a comment, sign in
-
-
Starburst has announced day-one support for the new NVIDIA Vera CPU, unlocking powerful performance for AI inference and analytics workloads. The move aims to help organizations run data-intensive AI pipelines faster and more efficiently. As AI workloads scale, tighter integration between data platforms and compute architecture is becoming critical. Read the full story: https://lnkd.in/dqqkHh56 #AI #ArtificialIntelligence #NVIDIA #Starburst #AIInfrastructure #DataAnalytics #MachineLearning #TechNews
To view or add a comment, sign in
-
-
Arm Holdings launches its first in-house data center CPU for agentic AI, backed by Meta and OpenAI. What the historic pivot means for ARM stock and competitors. Read more. #ArmHoldings #ARMCPU #AgenticAI #AIDataCenter #Semiconductors #ARM #AIInfrastructure #TechStocks #MetaAI #OpenAI Arm
To view or add a comment, sign in
-
NVIDIA’s ICMS announcement is a clear signal that AI inference is changing. Context and memory now matter as much as compute. This blog shares what that shift means and how WEKA is helping teams scale KV cache beyond GPU HBM and prepare for what’s next. http://spr.ly/6042hNOCe
To view or add a comment, sign in
-
NVIDIA’s ICMS announcement is a clear signal that AI inference is changing. Context and memory now matter as much as compute. This blog shares what that shift means and how WEKA is helping teams scale KV cache beyond GPU HBM and prepare for what’s next. http://spr.ly/6040B6y8eY
To view or add a comment, sign in
The street definitely agrees. Stock jumped more than 15% on 5x revenue forecast! Still don't don't quite know why they're filling up my feed with claims that the future of inference is CPU based though. Pretty much have to be right to make $ now.