OpenAI partners with Cerebras for 750MW AI Inference Deployment

This title was summarized by AI from the post below.

1mo

Today, OpenAI and Cerebras announced they have signed a multi-year agreement to deploy 750 megawatts of Cerebras wafer-scale systems to serve OpenAI customers. This deployment will roll out in multiple stages beginning in 2026, making it the largest high-speed AI inference deployment in the world. Cerebras delivers the world’s fastest inference service, about 15X faster than Nvidia GPU. This speed is essential for reasoning models and agentic AI to be successful. The world has now recognized what we saw as the critical element in enabling the next inflection point in AI adoption: speed. Cerebras #speed #fastinference

Cerebras cerebras.ai

20 Comments

Gregory Diamos 1mo

Way to go

1 Reaction

Dimitrios Ziakas, PhD 1mo

Congrats Dhiraj and the Cerebras team!

1 Reaction

Bala Iyer 1mo

Dhiraj Mallick thanks for your leadership!

Edward Izgorodin 1mo

This is huge for agentic AI. The real game-changer isn't just speed - it's the architectural shift that becomes possible when latency drops below 100ms. I wrote about how this fundamentally changes how we build production agents (from batch processors to real-time verify loops): https://www.linkedin.com/posts/izgorodin_ai-agenticai-aiengineering-activity-7417358369482678273-dTY5 Curious to hear perspectives from the Cerebras team on what architectural patterns you're seeing emerge at <100ms latency.

1 Reaction

Cerebras 1mo

🚀

4 Reactions

Edward Izgorodin 1mo

750MW isn't just about infrastructure scale - it's about fundamentally changing what's possible in AI deployment. When you can run inference at this speed consistently, you unlock entirely new use cases: persistent context windows, multi-agent orchestration, real-time knowledge synthesis. The rollout strategy will be fascinating to watch. https://www.linkedin.com/posts/edward-izgorodin_750mw-isnt-just-about-speed-its-about-activity-7299833513486180352-AEuI

1 Reaction

🧣 Renato Umeton, Ph.D. (Hiring), graphic

🧣 Renato Umeton, Ph.D. (Hiring) 1mo

Good. Now please move on and do the same with Claude Code from Anthropic. That's where we want the 2k TKS 🙏

1 Reaction

Pratyush Kamal 1mo

Since I expect Cerebras to win handsomely on inference/watt, why not include this in addition to total wattage in marketing communication? This will help shape us to be even more power aware.

1 Reaction

Mordechai Hartman 1mo

Amazing, congrats Dhiraj Mallick and Andrew Feldman. Been super cool watching the evolution of the wafer.

1 Reaction

Mark Chung 1mo

Dhiraj Mallick 🥳

1 Reaction

See more comments

To view or add a comment, sign in

More Relevant Posts

StartupTalky

7,241 followers
1mo
Report this post
AI enterprises, take note: Ziroh Labs’ Kompact AI is paving the way for CPU-native LLM inference—offering up to 80% cost savings and faster deployment, all on existing infrastructure. Is this the future of scalable enterprise AI? Read more via @StartupTalky: startuptalky What benefits and challenges do you see in shifting from GPU to CPU-native AI? https://lnkd.in/gVWZKbwu #startupnews #growth

Hrishikesh Dewan on Kompact AI & CPU-Native LLM Inference | Ziroh Labs startuptalky.com
Like Comment
To view or add a comment, sign in
Scott Sutherland
1mo
Report this post
Nvidia, Others in Talks for OpenAI Funding, invest as much as $60 billion in OpenAI round of as much as $100 billion valuation of about $750 billion to $830 billion, Information Says--- generative artificial intelligence ---
Like Comment
To view or add a comment, sign in
Maxim Geraskin
1mo
Report this post
OpenAI just made a multibillion-dollar computing deal with Cerebras Systems. And the most interesting part is not the money. 👇 First: the unit of this agreement is megawatts. OpenAI committed to buy up to 750 MW of compute over three years. This is not “how many GPUs”, this is “how much power does your data center consume”. That alone tells a lot about where AI scaling is today ⚡ Second: Cerebras is weirdly fast, at least so far. 🚀 Out of curiosity, I tried to challenge https://chat.cerebras.ai/ with my daughter’s homework: Solve the crossword: to bother or annoy someone repeatedly, seven letters, the third is o. The response came almost instantly. Subjectively, it felt tens or even hundreds of times faster than what im used to — yes, yet?, but still impressive. Why this matters. We are slowly moving from a world where AI progress is limited by models and ideas to a world where it is limited by power, hardware architecture, and supply chains. Nvidia is still dominant, but deals like this show that the industry is actively looking for exits from the single-vendor bottleneck. If Cerebras delivers at OpenAI scale, this may become a real shift, not just an experiment. 👀 Compute is the new oil — and now we measure it in megawatts, not chips 🔌🤖

Cerebras Inference chat.cerebras.ai

2 Comments
Like Comment
To view or add a comment, sign in
Isaac Alegre
1mo
Report this post
https://lnkd.in/ewrp4p77 "Cerebras' $10 billion deal with OpenAI positions the startup and its wafer-scale engine as a challenger to Nvidia in the AI chip market, while helping OpenAI try to accelerate the performance of its large AI models"

Cerebras Poses an Alternative to Nvidia With $10B OpenAI Deal aibusiness.com
Like Comment
To view or add a comment, sign in
Asif Razzaq
2w
Report this post
OpenAI Releases a Research Preview of GPT‑5.3-Codex-Spark: A 15x Faster AI Coding Model Delivering Over 1000 Tokens Per Second on Cerebras Hardware OpenAI has launched GPT-5.3 Codex-Spark, a research preview optimized for near-instant coding by delivering over 1000 tokens per second—a 15x speed increase over the flagship model. This massive performance jump is powered by the Cerebras Wafer-Scale Engine 3 (WSE-3), which eliminates traditional GPU bottlenecks by keeping all compute on a single silicon wafer, paired with a new persistent WebSocket connection that reduces networking overhead by 80%..... Full analysis: https://lnkd.in/gCvBfM47 Technical details: https://lnkd.in/gn_AkgmQ OpenAI Cerebras
Like Comment
To view or add a comment, sign in
Brian C.
1mo Edited
Report this post
As multi-modal models become the norm I think we are going to see a lot more impact from these "old school" adversarial ML attacks. If your whole AI red team approach is natural language prompt injection and you don't understand how these models actually work you are going to be missing a large attack surface.

Joseph Lucas

Data and AI Security
1mo

The old magic still works. It's not all prompts... there are still gradients to follow and decision boundaries to jump. I talk about PGD here, but also succeeded with C&W and HSJ. Here's my latest from the NVIDIA AI Red Team: https://lnkd.in/gfiziis6

Updating Classifier Evasion for Vision Language Models | NVIDIA Technical Blog developer.nvidia.com
Like Comment
To view or add a comment, sign in
Saran Menon
2w
Report this post
Alibaba just dropped the hammer on Google and Nvidia, proving that in 2026, raw parameter count is a vanity metric while inference efficiency is the only metric that actually pays the bills. Here is your high-velocity technical breakdown of the last 24 hours in AI. 1. Alibaba Decouples Intelligence From Size with "RynnBrain" The Drop: Alibaba’s DAMO Academy released RynnBrain, an open-source embodied AI model stack optimized for robotics, available now on Hugging Face. The Specs: Built on the Qwen3-VL vision-language backbone, RynnBrain activates only 3 billion parameters during inference yet reportedly crushes Google’s Gemini Robotics-ER 1.5 on 16 benchmarks. Architecture: Seven variants released, ranging from a tiny 2B model to a complex Mixture-of-Experts (MoE) architecture. Technical Impact: This is a localized inference breakthrough. By keeping active parameters at 3B, Alibaba enables sophisticated embodied reasoning directly on robot hardware (edge compute) without the latency crippling cloud-dependent rivals. 2. Mistral Shreds Latency with "Voxtral" The Drop: French labs released Voxtral Realtime and Voxtral Mini Transcribe V2 under Apache 2.0. The Specs: A 4 billion parameter model capable of processing speech-to-text with 200 milliseconds of latency. Supports 13 languages. Target: Edge devices (phones/laptops). Technical Impact: 200ms is the "magic number" for natural human interruption. Mistral has effectively solved the turn-taking lag that makes most voice bots feel robotic. This is the new baseline for edge-native voice agents. 3. The Code War: OpenAI Strikes Back with GPT-5.3 Codex The Drop: Following Anthropic's Claude Opus 4.6 release, reports confirm OpenAI has deployed GPT-5.3 Codex. The Specs: Early telemetry shows a Terminal Bench score of 77.3 (vs. Opus 4.6's 65.4). Efficiency: Significant reduction in token consumption for complex autonomous loops. Technical Impact: We are seeing a divergence in model distinctiveness. While Opus focuses on reasoning depth, GPT-5.3 Codex is optimizing for execution. The massive token reduction implies better internal "thought compression," allowing agents to run longer autonomous loops before hitting context windows or cost barriers. 4. The "Speed King" Benchmarks: Liquid Takes the Crown The Drop: Independent "Agentic AI" benchmarks released Feb 10. The Leader: Liquid LFM 2.5 is officially the fastest model in its class, hitting ~359 tokens/second. The Value Play: Ministral 3B clocked ~293 tokens/sec, offering the best price-to-performance ratio. Technical Impact: For agentic workflows requiring multiple steps (reflection, planning, execution), inference speed is the bottleneck. Liquid LFM 2.5's throughput allows for "thought-heavy" agent architectures that would be too slow on transformer-based architectures. #EmbodiedAI #EdgeAI #InferenceEfficiency #ArtificialIntelligence #TechNews
Like Comment
To view or add a comment, sign in
Asteris AI

178 followers
3w
Report this post
🚀 NVIDIA MAKES ITS LARGEST INVESTMENT EVER - IN OPENAI Jensen Huang just announced in Taipei that Nvidia will participate in OpenAI's latest funding round with what could be "the largest investment we've ever made." - Huang called OpenAI "one of the most consequential companies of our time" - The GPU giant is now directly investing in its biggest customer - This signals a major shift in how AI infrastructure is financed - The company that sells the picks and shovels is now buying into the gold mine The AI supply chain is consolidating faster than anyone expected. Sources: - https://lnkd.in/gNxEsTix #Nvidia #OpenAI #AIInvestment #TechNews #ArtificialIntelligence

In 2026, AI will move from hype to pragmatism | TechCrunch https://techcrunch.com
Like Comment
To view or add a comment, sign in
Craig Major
3w
Report this post
Project Stargate is a $500B push from OpenAI, Nvidia & Oracle to dominate AI with national-scale data centers. But could centralizing AI power risk stifling competition? #TechMonopoly #AIPolicy https://lnkd.in/gysQKvHT
Like Comment
To view or add a comment, sign in
The Edge Singapore

21,582 followers
3w
Report this post
Nvidia and OpenAI have been lynchpins of the AI boom, but their relationship has come under new scrutiny in recent days amid reports of tensions between the two firms.

Nvidia close to investing US$20 bil in OpenAI in latest round — Bloomberg theedgesingapore.com
Like Comment
To view or add a comment, sign in

6,939 followers

322 Posts

View Profile Connect

OpenAI partners with Cerebras for 750MW AI Inference Deployment

More Relevant Posts

Explore related topics

Explore content categories