HA failure at an emergency call center is not an IT problem. It is a life-safety crisis. When systems routing fire or police responses go down, the cost is measured in more than dollars. Trey Isaac, Sr. Product Support Engineer at SIOS Technology Corp., breaks down why HA health checks matter and why growing your cluster from two nodes to four multiplies configuration drift risk significantly. "The more siblings you add to a high availability solution, the higher the chance of human error comes in." In this short clip with Swapnil Bhartiya at TFiR, Trey explains HA business value and the risks of distributed cluster growth. Check out the discussion on our YouTube page: https://lnkd.in/g_BMTnUE #HighAvailability #SIOS #ClusterManagement #OperationalResilience #Uptime #BusinessContinuity #SRE #EnterpriseIT
TFiR
Online Audio and Video Media
Arlington, VA 767 followers
Video first b2b media brand for enterprise technologies.
About us
Leading video publication for enterprise technologies.
- Website
-
http://www.tfir.io
External link for TFiR
- Industry
- Online Audio and Video Media
- Company size
- 2-10 employees
- Headquarters
- Arlington, VA
- Type
- Privately Held
- Founded
- 2018
- Specialties
- Containers, Cloud, Machine Learning, Open Source, IoT, Robotics, AI
Locations
-
Primary
Get directions
Arlington, VA 22206, US
Updates
-
AI agents are not breaking because of the model. They are breaking because of everything around it. Most teams are rebuilding orchestration, memory, and tool routing from scratch every few months as models evolve. That operational burden is unsustainable, and managed agent platforms like Anthropic's Claude Managed Agents are shifting where that burden lands. What you will learn: where production agents actually fail, what enterprises gain and lose with managed platforms, and how to protect institutional knowledge when your runtime lives in someone else's infrastructure. Amit Naik, VP of AI at CData Software, joins us to make these trade-offs clear. "The process of creating that automation is what brings something that is in one person's individual memory into what is institutional memory." In this full conversation with Swapnil Bhartiya at TFiR, we explore the meta harness concept, context rot, enterprise tool integration, governance, and a maturity framework for deciding whether to build or buy your agent infrastructure. Check out the discussion on our YouTube page: https://lnkd.in/g68NfTrr #AIAgents #EnterpriseAI #Anthropic #MCP #AIGovernance #AgentOrchestration #CData #GenerativeAI
AI Agents Keep Failing in Production — Here's the Missing Layer | Amit Naik, CData
https://www.youtube.com/
-
85% of observed domains are failing foundational DNS security controls — and most security teams don't even know it. DNS is the internet's phone book, but it's become one of the most under-prioritized threats in enterprise security. Missing controls like DNSSEC, Certificate Authority Authorization, and SOA integration leave organizations silently exposed. Viewers will learn why DNS hygiene gaps are systemic, how quantum computing is raising the stakes for certificate management, and how to use Akamai's checklist to assess your own maturity level. Steve Winterfeld, Advisory CISO at Akamai Technologies, shares findings straight from their latest research. "85% of observed domains failed foundational DNS controls — and this is an area that often isn't well managed or put high enough in the risk portfolio to get the attention it deserves." In this short clip with Swapnil Bhartiya at TFiR, Steve explains the hidden DNS threat, what controls are being missed, and what to do about it. Check out the discussion on our YouTube page: https://lnkd.in/dtFetXP3 #DNS #DNSSecurity #CyberSecurity #CISO #Akamai #DNSSEC #QuantumComputing #EnterpriseSecurity #InfoSec #CyberRisk
Why DNS Is Your Biggest Unmanaged Risk Right Now | Steve Winterfeld, Akamai
https://www.youtube.com/
-
AI agents writing your code and pushing straight to live cloud is not a workflow. It is a liability. Cloud development feedback loops routinely run 30 minutes or more in regulated industries, blocking iteration and compressing quality gates. LocalStack solves this by running a full cloud environment locally, cutting cycles from hours to seconds. Waldemar Hummer, Co-founder and CTO at LocalStack, breaks down why the inner loop problem is getting worse as AI agents enter the development workflow. "You cannot always trust the code that's being generated by these agents. You need a way to test this in an efficient and also sandbox manner." In this short clip with Swapnil Bhartiya at TFiR, Waldemar explains the LocalStack AI sandbox use case and why local cloud environments are now essential infrastructure. Check out the discussion on our YouTube page: https://lnkd.in/gpK4WHeG #LocalStack #CloudDevelopment #AIAgents #DevOps #DeveloperExperience #CloudNative #PlatformEngineering #AgenticAI
Why Cloud Dev Feedback Loops Are Broken | Waldemar Hummer, LocalStack
https://www.youtube.com/
-
Kubernetes 1.36 ships native GPU scheduling, and it changes how AI workloads run on clusters. Platform and infrastructure teams have been patching around Kubernetes scheduling limits for AI and ML workloads for too long. This release closes those gaps with new native primitives. Ryota Sawada, Kubernetes (Official) 1.36 Release Lead, walks through what actually changed and why it matters for production. "The workload aware scheduling breaks that template part into workload and the actual runtime object into PodGroup, and that clear separation gives us even further clear connection point for the DRA." In this full conversation with Swapnil Bhartiya at TFiR, we explore Workload Aware Scheduling, DRA admin access, fine-grained Kubelet authorization now at stable, the beta graduation process, and the release name Haru. Check out the discussion on our YouTube page: https://lnkd.in/g3Mip3DV #Kubernetes #Kubernetes136 #CloudNative #AIInfrastructure #GPUScheduling #DynamicResourceAllocation #PlatformEngineering #OpenSource
Kubernetes 1.36: AI Workloads, GPU Scheduling & Security | Ryota Sawada
https://www.youtube.com/
-
The DIY AI infrastructure tax is enormous — and most enterprises are still underestimating it. Assembling a production-grade AI stack means making dozens of interdependent decisions: GPU operators, network fabric, storage tiers, multi-tenancy, day-two operations. Each one is effectively a six-month project. The result? Your best engineers are buried in plumbing instead of building differentiation. Richard Borenstein, SVP of Growth & Business Development at Mirantis, puts it plainly: "By the time you finish building, the technology has moved on and you're maintaining a deprecated architecture." In this short clip with Swapnil Bhartiya at TFiR, Richard explains why pre-integrated AI platforms like k0rdent AI aren't just convenient — they de-risk the entire process, enforce AI sovereignty, and keep enterprises current without rebuilding from scratch every 18 months. Check out the discussion on our YouTube page: https://lnkd.in/g2xfnjtz #AIInfrastructure #EnterpriseAI #Mirantis #PrivateCloud #DigitalSovereignty #OpenSource #CloudNative #MLOps
Why DIY AI Infrastructure Is a Tax Most Enterprises Can't Afford | Richard Borenstein, Mirantis
https://www.youtube.com/
-
Criminals are now targeting AI decision cycles — not just your data. Agentic AI systems approve loans, authorize payments, and make medical decisions. That makes them high-value targets for attackers who want to manipulate outcomes before anyone notices. Steve Winterfeld, Advisory CISO at Akamai Technologies, breaks down what security leaders need to know — including OWASP's new Agentic AI Top 10 covering goal hijacking, tool misuse, and memory poisoning. "As a CSO, I have $20 worth of problems in a $10 budget — so how am I going to optimize this?" In this short clip with Swapnil Bhartiya at TFiR, Steve explains how API protection, DDoS mitigation, and micro-segmentation must layer around every agentic AI deployment — and why knowing about a vulnerability is very different from it becoming operational in the wild. Check out the discussion on our YouTube page: https://lnkd.in/gbkAT7HH #AgenticAI #CyberSecurity #CISO #OWASP #Akamai #AIFirewall #LLMSecurity #ThreatModeling #APIProtection
-
Self-hosted AI isn't just for hyperscalers — even individual developers are finding immediate ROI from local GPU offloading. In this short clip, Rob Hirschfeld, CEO at RackN, gets completely practical: what are the actual conditions where running your own model on your own infrastructure is the right move? "The appetite for improving your AI throughput is bottomless at this point. Move as fast as you can on learning and building the skills to have an AI-enabled workforce — and at the same time, plan that you will be running your own AI workloads." Rob shares a striking example: RackN's largest AI inference customer runs over 40,000 servers and onboards new hardware within hours of physical delivery — before inventory paperwork is even complete. That pace is becoming the standard, not the exception. Check out the discussion on our YouTube page: https://lnkd.in/gVupA_EM #BareMetal #SelfHostedAI #AIInfrastructure #EnterpriseAI #AIInference #OpenSourceAI #HybridAI #PlatformEngineering
When Does Self-Hosted AI Actually Make Sense? Even Small Teams Have an ROI Case
https://www.youtube.com/
-
AI agents don't throw errors — they just quietly give you the wrong answer, and legacy observability was never built to catch that. This is the core production risk enterprises are sleepwalking into as agentic workloads move from demo to deployment. The scale of telemetry alone will break platforms built for the microservices era. Jeremy Burton, General Manager of the Observability Unit at Snowflake, lays out why the walled-garden model is finished — and what has to replace it. "It's not enough to just know whether something's broken or why something is broken. Explain to me why this response is different to what it was yesterday." In this full conversation with Swapnil Bhartiya at TFiR, we explore LLM response drift, headless observability via MCP, Iceberg-backed open telemetry storage, petabyte-scale query economics, and why the Datadog/Dynatrace UI model is being blown apart by coding agents. Check out the discussion on our YouTube page: https://lnkd.in/gwiZe3vR #Observability #AIAgents #Snowflake #LLMOps #OpenTelemetry #AIOps #SRE #CloudNative #DevOps
AI Agents Are Breaking Observability — Here's What's Next | Jeremy Burton, Snowflake
https://www.youtube.com/
-
Your high availability backup system may look ready — but it probably isn't. Most HA environments suffer from configuration drift: disk space added to the primary and forgotten on the secondary, admin rights changed on one node but not the other. When the primary goes down, the failover fails. Trey Isaac, Sr. Product Support Engineer at SIOS Technology Corp., explains how HA health checks — both live system walkthroughs and log-based reviews — expose these gaps before an outage does. "A lot of times what we uncover is that the systems are not identical twins. And when the first system goes down, things are not going to work properly on the second system." In this short clip with Swapnil Bhartiya at TFiR, Trey walks through how SIOS Lifekeeper health checks work in practice and why starting that routine now — not next quarter — is the only way to achieve real peace of mind. Check out the discussion on our YouTube page: https://lnkd.in/gss7hXjX #HighAvailability #DisasterRecovery #SIOSTechnology #Lifekeeper #BusinessContinuity #Failover #ITInfrastructure #ConfigurationDrift
Is Your HA Setup Actually Ready to Fail Over? | Trey Isaac, SIOS Technology
https://www.youtube.com/