Alexey Navolokin’s Post

The biggest AI infrastructure shift may not be happening in the cloud. It may be happening on your desk. Do you agree? For the last decade, AI economics were simple: If you needed more intelligence → you rented more cloud GPUs → you paid per token. That assumption is now breaking. AMD’s latest analysis of “Agent Computers” makes a strong case that we are entering a new phase of computing — where AI is no longer intermittent, but continuous, autonomous, and always-on. And that changes everything about cost. AI is no longer “query-based” — it is becoming “workload-based” The first wave of AI was: Ask a question Get an answer Stop The next wave is agentic: Plan a task Break it into steps Call tools Generate outputs Validate results Iterate continuously These agents don’t run once. They run all day. According to recent analysis, a single agentic workflow can already consume millions of tokens per day depending on workload intensity. That is the turning point. Because cloud pricing is still fundamentally linear: More usage = more cost More agents = more tokens More tokens = recurring bills that scale without limit A key insight from AMD’s model: A modern local “Agent Computer” can shift AI from: variable operational expense → fixed capital expense Instead of paying per token, you effectively: buy the compute once run inference continuously, absorb marginal cost via electricity (~tens of dollars/month in modeled scenarios) An example scenario shows: ~6M tokens/day sustained on a Ryzen AI Max-class system Electricity cost modeled around $16/month scale assumptions Equivalent cloud API usage potentially hundreds of dollars per month depending on model tier In higher throughput configurations (Radeon AI PRO-class systems), token throughput can scale even further, pushing: ~18M tokens/day class workloads significantly faster breakeven windows in heavy usage scenarios AI cost is shifting from “pay-per-use” → “amortized ownership” AMD’s new Ryzen AI platforms highlight why: Up to 200B–300B parameter models running locally on next-gen systems Up to 192GB unified memory architecture in workstation-class configurations Combined CPU + GPU + NPU designed specifically for agent workloads This matters because agent systems are not just “chat models”. The Agent Computer = local AI execution layer for continuous workloads The real architectural shift: cloud is no longer the default The cloud is not going away. The deeper shift: from “models” to “machines that work” This is the real transformation: We are no longer just using AI models. We are deploying systems that work continuously on our behalf. The unit of AI is no longer the prompt. It is the agent runtime. More details here: https://lnkd.in/g9jiZr2A #AI #AgenticAI #AMD #AIInfrastructure #EdgeAI #LLM #CloudComputing #Inference #GenerativeAI #Tech #Innovation #RyzenAI

  • graphical user interface

Appreciate the tag, Alexey Navolokin — I've actually been living this one at home. I ran a small experimental project recently with Claude Code as the agent. Great tool — but the agentic loop chews through tokens fast, and I found myself topping up usage just to keep it going. That's the linear cloud model the post describes, felt firsthand: more agent, more tokens, more bill. So I'm now standing up a Mistral model on a Ryzen AI workstation at home to run that same workload locally. Completely different economics — buy the compute once, absorb the marginal cost as electricity, keep the data on my own machine. The CapEx-vs-OpEx reframe in practice. And that's exactly what lands with enterprise customers: once an agent runs continuously, "pay once, run continuously" stops being a tech preference and becomes basic finance. Cloud for spikes, local for always-on — and AMD is one of the few who can tell that full story end to end.

🧠 This is a insightful analysis, Alexey. The shift towards the "Agent Computer" and the move from an OPEX to a CAPEX model, as illustrated marks the transition of AI from a mere query tool into an autonomous and sovereign infrastructure 🤖

Like
Reply

AI shifting from query tool to autonomous infrastructure is a healthcare inflection point. When intelligence moves from OPEX to CAPEX, we don’t just optimize care—we redesign how systems think, decide, and scale.

Tokens made sense when intelligence was episodic. You ask, it answers and billing follows interaction. But agents don’t behave like that which means the constraint becomes who can afford continuous cognitive load, not just queries.

Agentic AI feels like the next real inflection point beyond generative models. The shift from “responding to prompts” to “taking multi-step actions with context and goals” will fundamentally reshape how software systems operate.

I agree with the direction of the argument. Once AI becomes continuous, pricing models will have to change too. The old pay-per-request structure may not fully match how real workloads are actually executed anymore.

A fascinating shift indeed! It seems we're trading in our cloud rentals for desktop dynamos-kinda like moving from renting a movie to owning the entire franchise. What do you think? Virginia Loh Daniel Chia Ayush Batra WaiChung Ngai Karl Chung Jonathan Yap Alexis Lee Joyce Chan Eileen Goh; Jerome Ng; Francis Mammone; Aaron Primrose; Louie Kotitsas; Howard Tang; Jonno Dimopoulos; Dushy Satkunanandan; William Moffatt; Peter Chambers; Keith Strier

This is a fascinating shift, and if my desk could talk, it would probably start demanding a raise! The idea of moving from “pay-per-prompt” to “pay-per-workload” seems revolutionary. Thank you for sharing Alexey

See more comments

To view or add a comment, sign in

Explore content categories