Leadership gap widens with AI adoption

This title was summarized by AI from the post below.

Last week, a VP of Engineering told us his team had already built most of what we do. We'd spent four meetings with the engineers who actually run reliability for him. We knew where their data lived, what systems they used, how they triaged incidents. So we knew the team was working almost entirely by hand — getting paged, spinning up channels, linking tickets manually. He thought they were further along than they were. We knew more about the state of his org than he did. This isn't new. Leaders have always been a few steps removed from the work. Enterprises have grown for decades despite that gap. But AI makes the gap expensive in a way it never was before. Here's why: an agent only creates value when it conforms to how your team actually works. That requires a clear picture of the current state and a real definition of what "good" looks like. If you don't know what good is, an LLM won't tell you — it'll just help you get to the wrong place faster. The VP was anchored on a demo built on hard-coded workflows and runbooks. It looked impressive. It also wouldn't survive contact with the complexity of his actual stack — a stack he'd partly lost track of. So the risk wasn't that he'd buy nothing. It's that he'd buy something that amplified the disorganization already there. The fix isn't more diligence from the top. It's trusting the people doing the work to make the call. They're the ones living with the pain. They know what's worth automating and what good looks like, because they're standing in the ground truth every day. The further you sit from that ground truth, the more likely you are to believe your team is light-years ahead — or behind — where it actually is. That's the part AI doesn't fix for you: https://lnkd.in/gEXnfNqH

If you don’t know what good is, AI won’t tell you frontierai.substack.com

1 Comment

Ómar Thor Ómarsson 2d

It's wild how often the gap between a demo and daily ground truth gets overlooked.

To view or add a comment, sign in

More Relevant Posts

Daniel Semerjyan
3w
Report this post
I can write a $15k/day feature in 2 days. Then wait 3 weeks for the meeting that schedules another 2 weeks to find someone who understands the risk. This is enterprise GenAI in 2026. — I shipped a production-grade GenAI compliance system for a US enterprise client. Here's what actually slowed us down 👇 — 𝟭. 𝗗𝗲𝗰𝗶𝘀𝗶𝗼𝗻 𝗹𝗼𝗼𝗽𝘀, 𝗻𝗼𝘁 𝗰𝗼𝗱𝗲, 𝗮𝗿𝗲 𝘁𝗵𝗲 𝗯𝗼𝘁𝘁𝗹𝗲𝗻𝗲𝗰𝗸. A small team with Claude Code, Cursor, and modern AI tooling ships faster than product decisions get made. The engineering speed isn't the problem anymore. The decision speed is. — 𝟮. 𝗖𝗼𝗺𝗽𝗮𝗻𝘆 𝗽𝗼𝗹𝗶𝗰𝗶𝗲𝘀 𝗵𝗮𝘃𝗲𝗻'𝘁 𝗰𝗮𝘂𝗴𝗵𝘁 𝘂𝗽 𝘁𝗼 𝘁𝗵𝗲 𝗻𝗲𝘄 𝘀𝗽𝗲𝗲𝗱. Procurement, security review, and architecture governance were designed for quarterly releases. We're shipping weekly. The processes have to evolve — not because they're wrong, but because the cadence they assume no longer exists. You can't fix that with another meeting. — 𝟯. 𝗧𝗵𝗲 𝗲𝗰𝗼𝘀𝘆𝘀𝘁𝗲𝗺 𝗺𝗼𝘃𝗲𝘀 𝗳𝗮𝘀𝘁𝗲𝗿 𝘁𝗵𝗮𝗻 𝘆𝗼𝘂𝗿 𝗿𝗼𝗮𝗱𝗺𝗮𝗽. Models, frameworks, and agentic patterns ship weekly. Half the things on our backlog got solved by an open-source release before we got to them. Building something that becomes obsolete on Tuesday is the new technical debt. "Wait two weeks before building" is sometimes the most senior decision you can make. — 𝗪𝗵𝗮𝘁 𝗜'𝗱 𝘀𝗸𝗶𝗽 𝗻𝗲𝘅𝘁 𝘁𝗶𝗺𝗲: → Assuming engineering velocity is the constraint → Building before checking what shipped this week → Pretending old governance can absorb new delivery speed — The prototype proves the idea is possible. Production proves you understood the organization. Same engineer. Different timeline. Much more patient. — What's the #4 you'd add to this list? Curious which gap hit you hardest. #GenAI #AgenticAI #EnterpriseAI #ClaudeCode
1 Comment
Like Comment
To view or add a comment, sign in
Leif Jackson
3d
Report this post
"Every 30 minutes, someone creates something I have to look at." That quote from "Managers Are Struggling to Keep Up with the AI Productivity Boom" (HBR, May 2026) landed hard for me. Because I've watched this exact bottleneck form at product companies shipping AI features. Engineering builds fast. A team working with LLMs or AI models can iterate daily, ship multiple features, deploy new models. But the decisions that matter: the ones about fairness, accuracy, transparency, risk those still require people. People with scarce time and deep expertise. At DataGuidance, we hit this building the RAG Copilot. The team could prototype new retrieval approaches constantly. But before shipping, you need: What's the hallucination risk here? Can users audit what the system found? How do we know when to trust it and when not to? Those aren't engineering questions. They require judgment. We needed an analyst that could validate the responses consistently which is a scare, limited resource. The trap is adding more managers to the approval queue. But that doesn't fix the bottleneck. It just moves the pain point, and costs move money in terms of manpower. The companies getting ahead aren't the ones slowing down engineering. They're the ones asking: What decisions can we automate? What can we push into the product design itself? Where can we build in visibility, so approval happens faster because reviewers see exactly what they need to see? Making explainability part of the feature, not a gate after the fact. Building audit trails into the product so governance can review what happened instead of trying to predict what might happen. That's harder upfront. But it scales. Where in your product can you scale by redesigning the decision process itself?

2 Comments
Like Comment
To view or add a comment, sign in
Treb Gatte, MBA, MCTS, MVP
3w
Report this post
The most useful comment on anything I’ve written this year came from Yasir Abbas. His comment reframed my own argument back at me in a way I could not have done myself. This week I published a piece on what we call Governed Velocity, our operating model for AI-assisted engineering at Marquee Insights. Yasir pulled out the line that mattered most and quietly extended it past the boundary I had drawn. The line was this. “You do not trust the output by itself. You trust the process that produced it.” Yasir pointed out that this applies far beyond software engineering. He is right, and I have been thinking about where else it lands all week. I think it lands hardest on enterprise AI adoption. Most of the AI adoption conversations I sit in right now skip the same step. A team sees an impressive output, a generated report, a working prototype, a polished draft, and the conversation immediately becomes about how to scale it. Roll it out, train more people on it, get more seats, ship it to customers. The step that gets skipped is asking how the output was produced. What was the prompt lineage. What context was given. What review happened. What governance shaped it. What would have to be true for this to be reproducible by someone else next quarter, on a different dataset, under audit. When that step is skipped, organizations end up trusting outputs they cannot defend. The first time something goes wrong in production, in front of a regulator, or in front of a client, the absence of process becomes very expensive very quickly. The teams that are pulling ahead are not the ones generating the most. They are the ones who can tell you exactly how their AI work was produced and why they trust it. Generation speed is becoming a commodity. Governed process is not. Thank you, Yasir.

1 Comment
Like Comment
To view or add a comment, sign in
Gremlin

12,328 followers
2w
Report this post
Agentic AI has drastically increased deployment rates. More releases = more bugs, more dependencies, and ultimately, more risk. That means that testing needs to adapt- and our 2026 Buyer's Guide covers exactly what to look for in a chaos engineering tool built for the AI era. Get the guide: https://hubs.la/Q04drFtN0

2026 Chaos Engineering Enterprise Buyer's Guide gremlin.com
Like Comment
To view or add a comment, sign in
Zach Salyers
3d
Report this post
I sat down this week to revise a scope I wrote in March, and I ended up writing a different product. Same client. Same problem. Different shape. Here's what happened. In March, when I scoped the work, the question of whether AI could reliably open an original file, draft a change inside it as a tracked edit, and queue that draft for a human to accept or reject — that was still an open question I wasn't willing to stake a fixed-fee build on. So I went conservative. The system would tell the team when something needed updating. The team would handle the actual update. It was the right call at the time. This week I sat down to start the build, and that capability isn't an open question anymore. It's a plain Tuesday. So the conservative scope was solving maybe 40% of the actual pain when 90% was now on the table. I rewrote the proposal. The scope roughly doubled. The shape of the system is fundamentally different — it doesn't just notice that work needs to happen, it does a first draft of it. Here's the lesson I'm sitting with. The line between "AI does the work" and "AI alerts a human to do the work" is moving every quarter right now. Not annually. Not every release cycle. Every quarter. Scope a project on January's capabilities, build through Q2, ship in July — and you've delivered a competent solution that was already undersized the day you wrote the contract. The expensive AI mistake in small business right now isn't a bad build. It's an on-time, on-budget build of last quarter's design. What I'm changing: every project gets a "capability boundary check" at the start of each phase. One question before any code gets written — what's moved since the scope was authored? Where should the human-vs-AI line actually sit today? Sometimes the answer is "nothing, ship it." Sometimes the whole shape just changed. What's on your roadmap that was scoped on a capability that's already moved? #automation #SMB #SmallBizOps
Like Comment
To view or add a comment, sign in
Salman Khandu
3d
Report this post
"𝗧𝗵𝗲 𝘀𝘆𝘀𝘁𝗲𝗺 𝘁𝗲𝘀𝘁: Are you building a system the customer runs their work through, or a tool that sits on top of a system they already have? Systems own the workflow end-to-end — the data capture, the governance, the records of what got done — and they’re what the customer points to when describing how the actual work happens. Tools on the other hand just add intelligence to a workflow the customer already runs. The tool case generates real revenue and the labs can take it because the customer isn’t depending on you as the orchestration layer. High ACV is usually a signal of a system, since systems replace real headcount and get paid accordingly, but it isn’t a guarantee. Ask yourself if the customer would still need your tool if a lab shipped something that supposedly directly competes with you. If yes, you’re building a system. If no, you’re a tool — even if your ACV is high."

Avoiding Death on the Yellow Brick Road a16z.news
Like Comment
To view or add a comment, sign in
Perla Gámez
1w
Report this post
Usage-based pricing punishes engineering teams more than ever. AI is rapidly speeding up development, so monitoring costs are skyrocketing. That normally means: - Higher ingestion fees - Skyrocketing storage costs - Crippling indexing costs The old pricing model simply doesn’t make sense anymore. We knew this when we built Foam; that’s why we designed a model where teams only pay when we diagnose a genuine error. That means that when your product is growing faster than ever before, you’re not being punished for success. We’re at the age where software shouldn’t be holding back your ability to scale. It should make your team able to sleep more easily at night and move faster than ever. P.S. If you want error detection that doesn’t cost an arm and a leg, check out https://foam.ai/
2 Comments
Like Comment
To view or add a comment, sign in
Harish Srigiriraju
2w
Report this post
One trend I’m increasingly noticing in my team, and validated this trend with few others in the industry: PR reviews are becoming the new bottleneck for engineering teams. AI can now generate code, tests, documentation, migrations, and even refactors in minutes. What used to take days can now be done before lunch. But the review process hasn’t evolved. Engineers are now reviewing: • Much larger PRs • Faster submission cycles • AI-generated code • More edge cases and hidden complexity In some teams, the actual coding is no longer the slowest part of shipping software. The waiting for reviews is the bottleneck! This creates an interesting challenge: As AI accelerates code generation, companies may need to completely rethink: • PR size and ownership • Review workflows • Automated validation • Trust systems for AI-generated code • How much human review is actually necessary Otherwise we risk building organizations where AI writes code at 10x speed… but humans approve it at the same pace as 2021.

1 Comment
Like Comment
To view or add a comment, sign in
Shivanand Sharma
3w
Report this post
AI agents are entering production faster than our reliability practices are maturing. After reading Datadog’s State of AI Engineering, SoftwareSeni’s piece on AI-SRE failure modes, and the MAST paper on multi-agent system failures, one pattern stood out to me: Agent reliability is not mainly a model-selection problem. It is a systems-engineering problem. The failure modes feel familiar, but the blast radius is different. Rate limits can become production incidents. Retries can quietly multiply cost. Long system prompts can dominate token spend. Multi-agent workflows can fail because one tool call breaks. Agents can hallucinate services, dependencies, or remediation steps. Prompt injection can enter through logs, tickets, alerts, and runbooks. The uncomfortable part is that a workflow with many individually “reliable” steps can still be unreliable end-to-end. A 97% successful tool call sounds strong. Across 30 sequential calls, though, the probability that everything succeeds is only about 40%. That changes how I think about production agents. Before flashy autonomy, we need the boring controls: token and tool-call budgets, retry caps, exponential backoff, circuit breakers, model and provider fallbacks, full tracing, human approval for high-blast-radius actions, regression evals, and explicit verification before declaring success. My Takeaway: Do not evaluate agents only by demo quality. Evaluate the whole execution chain. The next wave of AI engineering will not be won by teams with the most agents. It will be won by teams that can observe, constrain, test, and govern them like production infrastructure. LLM-as-Judge is something I've been working on, because human-in-the-loop can be impractical most of the time. The last mile challenge is injecting it into the loop. Can't reveal more at the moment.
Like Comment
To view or add a comment, sign in
Ifaturoti Adeyemi, MSC
4d
Report this post
973×. That's not a typo. It's how much more often elite engineering teams deploy than low performers. (DORA, 39,000+ teams.) The gap isn't strategy. It isn't tooling. It isn't even AI. It's execution latency — the time between a decision and learning whether it worked. And in 2026, it's the most expensive cost in modern work, because it never appears on a P&L. I broke down the framework, the guardrail (when fast kills — see Knight Capital, $440M in 45 minutes), and 3 questions to expose where your org is hiding latency under the word 'diligence.' ↓ Swipe through.

3 Comments
Like Comment
To view or add a comment, sign in

4,974 followers

View Profile Follow

Leadership gap widens with AI adoption

More from this author

The future of software is production

Explore content categories