Stop scaling GPU workloads on blind CPU/memory metrics. In this guide, Pavan Maduri breaks down how to build a KEDA external scaler using a DaemonSet to pull NVML metrics over gRPC directly—giving you sub-second scaling while bypassing the Prometheus pipeline entirely. 🧠 Read the full walkthrough: https://bit.ly/4nTPVEv #Kubernetes #KEDA #CloudNative #AIInfrastructure
Thanks Cloud Native Computing Foundation (CNCF) for featuring this! Bypassing the Prometheus pipeline and reading NVML metrics directly over gRPC was a massive friction point we had to solve for sub-second scaling. Glad to share the KEDA external scaler architecture with the broader cloud-native community—hope it helps others scaling heavy GPU inference workloads!
Really interesting!
Very insightful architecture Pavan Madduri One of the common challenges with AI and ML platforms is that infrastructure scaling decisions are often disconnected from actual GPU demand. Bypassing the traditional metrics pipeline and using direct NVML driven signals is an interesting approach to achieve faster and more accurate scaling behavior. This is the kind of cloud native innovation that helps bridge the gap between platform operations and AI workload requirements.