Build KEDA External Scaler with NVML Metrics

This title was summarized by AI from the post below.

Stop scaling GPU workloads on blind CPU/memory metrics. In this guide, Pavan Maduri breaks down how to build a KEDA external scaler using a DaemonSet to pull NVML metrics over gRPC directly—giving you sub-second scaling while bypassing the Prometheus pipeline entirely. 🧠 Read the full walkthrough: https://bit.ly/4nTPVEv #Kubernetes #KEDA #CloudNative #AIInfrastructure

  • No alternative text description for this image

Very insightful architecture Pavan Madduri One of the common challenges with AI and ML platforms is that infrastructure scaling decisions are often disconnected from actual GPU demand. Bypassing the traditional metrics pipeline and using direct NVML driven signals is an interesting approach to achieve faster and more accurate scaling behavior. This is the kind of cloud native innovation that helps bridge the gap between platform operations and AI workload requirements.

Thanks Cloud Native Computing Foundation (CNCF) for featuring this! Bypassing the Prometheus pipeline and reading NVML metrics directly over gRPC was a massive friction point we had to solve for sub-second scaling. Glad to share the KEDA external scaler architecture with the broader cloud-native community—hope it helps others scaling heavy GPU inference workloads!

Like
Reply
See more comments

To view or add a comment, sign in

Explore content categories