Kubesimplify’s cover photo
Kubesimplify

Kubesimplify

Software Development

On a mission to simplify Cloud Native and Web Assembly for everyone!

About us

Passionate about Cloud Native and Kubernetes? Join the thriving Kubesimplify community! We're dedicated to simplifying the Cloud Native complexities, fostering collaboration, and sharing insights. Explore the latest trends, engage with industry experts, and elevate your expertise in the Cloud Native ecosystem.

Website
https://kubesimplify.com/
Industry
Software Development
Company size
2-10 employees
Headquarters
Bangalore
Type
Self-Employed
Founded
2022

Locations

Employees at Kubesimplify

Updates

  • View organization page for Kubesimplify

    17,397 followers

    K8s 1.36 ships DRA partitionable resources. (This is bigger than people think.) What changed: → A single H100 can be claimed as 4× 20GB partitions → Each partition is its own ResourceClaim, independently scheduled → Tenant A and Tenant B can share one physical GPU with scheduler-level allocation boundaries Use cases: Inference workloads that don't need a full H100 Dev environments with cheap GPU access Multi-tenant ML labs Limitations: → H100 / B100 only (Hopper+) → Partition layouts fixed at boot → Some kernel patches required Isolation note: True memory isolation still requires vGPU, Confidential Computing, or physical separation. If you're spending H100 dollars on workloads that need an A30, look at this seriously.

    • No alternative text description for this image
  • Tenant isolation in Kubernetes - the 2026 reality most teams haven't faced yet. For years, "multi-tenant Kubernetes" meant namespaces + RBAC + NetworkPolicy. That's still the default in 90% of clusters. Here's what changed. The threat model expanded. In 2018, multi-tenancy meant separating different application teams. In 2026, it means separating: → Application teams (Old) → AI agents executing tools (NEW) → Customer code in your platform (NEW) → Untrusted training jobs from data scientists (NEW) → CI/CD runners pulling arbitrary code (NEW) Namespaces + RBAC + NetworkPolicy was designed for the first one. It's insufficient for the rest. The real options for 2026: 1. vNode (user-namespace runtime isolation) Linux user namespaces + seccomp. Near-zero overhead, real isolation. Best for: AI agents, untrusted code, multi-tenant labs. 2. vCluster / K3k / Kamaji (nested Kubernetes) Each tenant gets a real Kubernetes cluster experience including cluster-admin in their own isolated control plane. Best for: portability, customization, multi-cloud, genuine cluster-admin needs. 3. Kata Containers (VM-level isolation) Full VM isolation, ~10% overhead, slower cold starts. Best for: regulated industries that need air-gapped isolation. 4. gVisor (syscall interception) ~20-40% overhead, syscall compatibility tradeoffs. Rarely the right answer in 2026. The teams getting this right are matching the threat model to the tool. Not over-engineering. Not under-engineering. What's your tenant isolation choice in 2026 and what's the threat model driving it?

    • No alternative text description for this image
  • Choosing a model serving platform on Kubernetes in 2026 here's the decision framework I actually use. (Save this for your next ML infra architecture review.) Every conversation starts with 4 questions: Q1: HOW MANY MODELS WILL YOU SERVE? → <<10 models: Any platform works. Pick the easiest path — Ray Serve if you're Python-heavy, KServe otherwise. → 10–100 models: KServe is the right answer. ModelMesh handles this density well. → 1000+ models: KServe ModelMesh, specifically. It was designed for this scale. Q2: WHAT FRAMEWORK ARE YOUR MODELS IN? → Mostly HuggingFace transformers: KServe (native HF runtime). → Custom PyTorch/TensorFlow code: Ray Serve (more flexibility for non-standard pipelines). → Mixed/legacy stack: Seldon Core v2 (paid) or roll-your-own if you have the platform bandwidth. Q3: WHAT'S YOUR LATENCY REQUIREMENT? → <<50ms p99: vLLM via KServe, or TGI directly. Inference optimization is your bottleneck, not the serving layer. → <<200ms p99: Any platform works. Focus on autoscaling and resource right-sizing instead. → Batch jobs only: Ray Serve with batching enabled. Don't over-engineer real-time infra for offline workloads. Q4: WHAT'S YOUR TEAM'S KUBERNETES MATURITY? → Strong platform team: Any platform. Pick purely by features and ecosystem fit. → Application engineers only: KServe. It's the most opinionated, with the fewest footguns. → Mixed team: Ray Serve if Python-heavy, KServe otherwise. Match the tool to the team's primary skill set. THE BORING ANSWER FOR MOST TEAMS: KServe v0.13. Native vLLM runtime. ModelMesh for scaling. OpenAI-compatible API. Apache 2.0. Active community. Production-grade. The exotic answers Triton, Seldon, custom stacks are for specific situations. If you're choosing one of those, you should be able to articulate exactly why in one sentence. If you can't, you probably don't need them. What was your team's last model serving decision — and how do you feel about it now? Drop a comment. 👇 #MachineLearning #MLOps #Kubernetes #KServe #ModelServing #LLMOps #InfrastructureEngineering #CloudNative

    • No alternative text description for this image
  • KServe v0.13 is out and I think it’s the most important Kubernetes inference platform release of 2026. The headline features are impressive: → Native vLLM runtime → ModelMesh support for serving thousands of models per pod → OpenAI-compatible APIs But the bigger story is what these features enable. For years, “model serving on Kubernetes” usually meant choosing one of three paths: → Roll your own stack (custom controllers, custom predictors, operational overhead) → Use Seldon Core (mature ecosystem, but now moving toward paid-only offerings) → Use a hyperscaler’s managed inference platform (convenient, but with infrastructure lock-in) KServe v0.13 makes a fourth option genuinely viable: Open-source. Kubernetes-native. Production-grade model serving. That changes the equation for a lot of teams. What I’m seeing now: → Teams that avoided Kubernetes for inference are reconsidering it → Teams paying heavily for managed serving finally have a credible self-hosted path → Teams stuck on deprecated Seldon Core v1 deployments now have a clear migration target The migration stories are interesting too. Teams moving from Seldon → KServe report: → ~6 weeks to migrate ~50 services → Often handled by a single engineer → Similar or lower infrastructure cost → Better reliability → Faster iteration velocity The migrations that don’t go smoothly usually involve: → Deeply custom predictor logic tied to Seldon runtimes → Complex canary / A-B testing workflows that need redesign My take: KServe is becoming the boring, correct answer for model serving on Kubernetes in 2026. And honestly, that’s probably the strongest signal of maturity an infrastructure platform can get. What’s your current model serving stack?

    • No alternative text description for this image

Similar pages

Browse jobs