Kubesimplify

Software Development

On a mission to simplify Cloud Native and Web Assembly for everyone!

Discover all 3 employees

About us

Passionate about Cloud Native and Kubernetes? Join the thriving Kubesimplify community! We're dedicated to simplifying the Cloud Native complexities, fostering collaboration, and sharing insights. Explore the latest trends, engage with industry experts, and elevate your expertise in the Cloud Native ecosystem.

Website: https://kubesimplify.com/
External link for Kubesimplify
Industry: Software Development
Company size: 2-10 employees
Headquarters: Bangalore
Type: Self-Employed
Founded: 2022

Locations

Primary

Bangalore, IN

Get directions

Employees at Kubesimplify

See all employees

Updates

Kubesimplify

17,397 followers
6h Edited
Report this post
K8s 1.36 ships DRA partitionable resources. (This is bigger than people think.) What changed: → A single H100 can be claimed as 4× 20GB partitions → Each partition is its own ResourceClaim, independently scheduled → Tenant A and Tenant B can share one physical GPU with scheduler-level allocation boundaries Use cases: Inference workloads that don't need a full H100 Dev environments with cheap GPU access Multi-tenant ML labs Limitations: → H100 / B100 only (Hopper+) → Partition layouts fixed at boot → Some kernel patches required Isolation note: True memory isolation still requires vGPU, Confidential Computing, or physical separation. If you're spending H100 dollars on workloads that need an A30, look at this seriously.
2 Comments

Like Comment Share
Kubesimplify

17,397 followers
2d
Report this post
Day 10: NVIDIA Cloud Functions (NVCF) — serverless for the GPU era. Kubesimplify on LinkedIn

Like Comment Share
Kubesimplify

17,397 followers
2d
Report this post
Tenant isolation in Kubernetes - the 2026 reality most teams haven't faced yet. For years, "multi-tenant Kubernetes" meant namespaces + RBAC + NetworkPolicy. That's still the default in 90% of clusters. Here's what changed. The threat model expanded. In 2018, multi-tenancy meant separating different application teams. In 2026, it means separating: → Application teams (Old) → AI agents executing tools (NEW) → Customer code in your platform (NEW) → Untrusted training jobs from data scientists (NEW) → CI/CD runners pulling arbitrary code (NEW) Namespaces + RBAC + NetworkPolicy was designed for the first one. It's insufficient for the rest. The real options for 2026: 1. vNode (user-namespace runtime isolation) Linux user namespaces + seccomp. Near-zero overhead, real isolation. Best for: AI agents, untrusted code, multi-tenant labs. 2. vCluster / K3k / Kamaji (nested Kubernetes) Each tenant gets a real Kubernetes cluster experience including cluster-admin in their own isolated control plane. Best for: portability, customization, multi-cloud, genuine cluster-admin needs. 3. Kata Containers (VM-level isolation) Full VM isolation, ~10% overhead, slower cold starts. Best for: regulated industries that need air-gapped isolation. 4. gVisor (syscall interception) ~20-40% overhead, syscall compatibility tradeoffs. Rarely the right answer in 2026. The teams getting this right are matching the threat model to the tool. Not over-engineering. Not under-engineering. What's your tenant isolation choice in 2026 and what's the threat model driving it?
Like Comment Share
Kubesimplify

17,397 followers
3d
Report this post
Saiyam will be in Bengaluru tomorrow for the AI Infrastructure Meetup hosted by vCluster! 🚀 Looking forward to connect with the amazing community and having some great conversations around AI infrastructure, Kubernetes and platform engineering. Come say hi if you are there ��

2 Comments

Like Comment Share
Kubesimplify

17,397 followers
3d
Report this post
Choosing a model serving platform on Kubernetes in 2026 here's the decision framework I actually use. (Save this for your next ML infra architecture review.) Every conversation starts with 4 questions: Q1: HOW MANY MODELS WILL YOU SERVE? → <<10 models: Any platform works. Pick the easiest path — Ray Serve if you're Python-heavy, KServe otherwise. → 10–100 models: KServe is the right answer. ModelMesh handles this density well. → 1000+ models: KServe ModelMesh, specifically. It was designed for this scale. Q2: WHAT FRAMEWORK ARE YOUR MODELS IN? → Mostly HuggingFace transformers: KServe (native HF runtime). → Custom PyTorch/TensorFlow code: Ray Serve (more flexibility for non-standard pipelines). → Mixed/legacy stack: Seldon Core v2 (paid) or roll-your-own if you have the platform bandwidth. Q3: WHAT'S YOUR LATENCY REQUIREMENT? → <<50ms p99: vLLM via KServe, or TGI directly. Inference optimization is your bottleneck, not the serving layer. → <<200ms p99: Any platform works. Focus on autoscaling and resource right-sizing instead. → Batch jobs only: Ray Serve with batching enabled. Don't over-engineer real-time infra for offline workloads. Q4: WHAT'S YOUR TEAM'S KUBERNETES MATURITY? → Strong platform team: Any platform. Pick purely by features and ecosystem fit. → Application engineers only: KServe. It's the most opinionated, with the fewest footguns. → Mixed team: Ray Serve if Python-heavy, KServe otherwise. Match the tool to the team's primary skill set. THE BORING ANSWER FOR MOST TEAMS: KServe v0.13. Native vLLM runtime. ModelMesh for scaling. OpenAI-compatible API. Apache 2.0. Active community. Production-grade. The exotic answers Triton, Seldon, custom stacks are for specific situations. If you're choosing one of those, you should be able to articulate exactly why in one sentence. If you can't, you probably don't need them. What was your team's last model serving decision — and how do you feel about it now? Drop a comment. 👇 #MachineLearning #MLOps #Kubernetes #KServe #ModelServing #LLMOps #InfrastructureEngineering #CloudNative
Like Comment Share
Kubesimplify

17,397 followers
3d
Report this post
Claude Opus 4.8 Ships With Dynamic Workflows Kubesimplify on LinkedIn

Like Comment Share
Kubesimplify

17,397 followers
3d
Report this post
Day 9 — KubeRay: Distributed ML on Kubernetes Kubesimplify on LinkedIn

Like Comment Share
Kubesimplify reposted this
Saiyam Pathak
4d
Report this post
Their goes a lot form Prompt to response. I wrote a complete breakdown on this in very simple terms with animations and diagrams to simplify all the concepts. Comment "LLM" and I will send you the direct link. #AI #LLM #Ollama #chatgpt

1 Comment

Like Comment Share
Kubesimplify reposted this
Saiyam Pathak
4d Edited
Report this post
Do you want a KubeCon India Warmup? I did that 2 years ago and I am feeling that we should do this again with even bigger stakes - one FREE Kubernetes certification voucher - comment "YES" and we will make it happen. #kubecon India
29 Comments

Like Comment Share
Kubesimplify

17,397 followers
3d
Report this post
KServe v0.13 is out and I think it’s the most important Kubernetes inference platform release of 2026. The headline features are impressive: → Native vLLM runtime → ModelMesh support for serving thousands of models per pod → OpenAI-compatible APIs But the bigger story is what these features enable. For years, “model serving on Kubernetes” usually meant choosing one of three paths: → Roll your own stack (custom controllers, custom predictors, operational overhead) → Use Seldon Core (mature ecosystem, but now moving toward paid-only offerings) → Use a hyperscaler’s managed inference platform (convenient, but with infrastructure lock-in) KServe v0.13 makes a fourth option genuinely viable: Open-source. Kubernetes-native. Production-grade model serving. That changes the equation for a lot of teams. What I’m seeing now: → Teams that avoided Kubernetes for inference are reconsidering it → Teams paying heavily for managed serving finally have a credible self-hosted path → Teams stuck on deprecated Seldon Core v1 deployments now have a clear migration target The migration stories are interesting too. Teams moving from Seldon → KServe report: → ~6 weeks to migrate ~50 services → Often handled by a single engineer → Similar or lower infrastructure cost → Better reliability → Faster iteration velocity The migrations that don’t go smoothly usually involve: → Deeply custom predictor logic tied to Seldon runtimes → Complex canary / A-B testing workflows that need redesign My take: KServe is becoming the boring, correct answer for model serving on Kubernetes in 2026. And honestly, that’s probably the strongest signal of maturity an infrastructure platform can get. What’s your current model serving stack?
Like Comment Share

Kubesimplify

Software Development

On a mission to simplify Cloud Native and Web Assembly for everyone!

About us

Locations

Employees at Kubesimplify

Saiyam Pathak

Saloni Narang

Prianshu Mukherjee

Updates

Join now to see what you are missing

Similar pages

vCluster

LearnXOps

Civo

Cloud Native Computing Foundation (CNCF)

Exoscale

GrowInCommunity

Netris

Randoli

Kong

KodeKloud

Browse jobs

Engineer jobs

Site Reliability Engineer jobs

Solutions Architect jobs

Software Engineer jobs

Application Developer jobs

iOS Developer jobs

Cloud Architect jobs

Developer jobs

Platform Engineer jobs

Linux System Administrator jobs

Android Developer jobs

System Administrator jobs

Information Technology Manager jobs

Engineering Manager jobs

Analyst jobs

Architect jobs

Cyber Security Specialist jobs

Linux Administrator jobs

Software Administrator jobs

Python Developer jobs