1,100+ Researchers Clear H100 Barrier with Frictionless Access

This title was summarized by AI from the post below.

11,020 followers

1,100 pull requests. 4,800+ GitHub stars. Top 0.1% of globally trending repositories within 48 hours. That's what happened when compute was accessible enough that researchers could actually try things. Frictionless access to H100s meant more experiments. More experiments meant more techniques tested. The leaderboard kept moving because the barrier to entry was low enough that 1,100 people cleared it. Full recap: https://lnkd.in/giwU6wWy

1 Comment

RuntimeBoost 5d

This is what happens when you remove friction from compute. More scaling means a higher demand for top-tier runtime performance. 🎯

1 Reaction

To view or add a comment, sign in

More Relevant Posts

Abdulrehman Abulaban
2w
Report this post
Spent the last week and a half building this. Learned something, applied it. Learned something else, applied it. Repeat. The result is a full MLOps pipeline that predicts CPU performance from hardware specs — and actually looks like something a real team would build. Not because I copied a template. Because every part of it made me stop, go read, break something, fix it, and move on. Here's what's in it: 5 pipeline stages — ingestion, validation, transformation, training, evaluation. Each one independent, each one can break without taking everything else down. MLflow + DagsHub for experiment tracking. Every run logged automatically. No more "wait which model was that again". Flask web app with a /predict endpoint. You put in the hardware specs, you get a performance score back. Simple. Config-driven design. Zero hardcoded values anywhere in the codebase. Model hits R² = 0.908. But honestly that wasn't the hard part. The hard part was the architecture. Figuring out how config, components, pipeline, and entity layers are supposed to talk to each other. Most tutorials don't even mention this. You only understand it when you build it yourself. Week and a half. Worth every hour. https://lnkd.in/d7mxkYRb #MLOps #MachineLearning #Python #DataScience #OpenToWork #devops

GitHub - aboodcs/CPU_performance_predicting github.com

2 Comments
Like Comment
To view or add a comment, sign in
Tushar Jain
1w
Report this post
Today I stress-tested my self-hosted RAG ingestion pipeline with ~30,000 queued PDF jobs on a local GTX 1650 laptop. The pipeline currently uses: - Redis queues - multiple workers - HuggingFace MiniLM-L6-v2 embeddings - CUDA acceleration - DLQs for corrupted PDFs One interesting thing I discovered: the biggest bottleneck wasn’t embeddings or Redis — it was uncontrolled queue pressure. Pushing jobs too aggressively caused workers to eventually become unstable under sustained load. Adding a tiny delay between queue pushes completely stabilized the pipeline and allowed workers to continuously process thousands of files without crashing. Also learned the hard way that: “more workers” does not always mean “better scaling” 😄 This experiment unexpectedly turned into a really fun hands-on lesson in: - backpressure - worker orchestration - throughput stabilization - fault isolation - and distributed pipeline behavior under load Next step: building the retrieval pipeline with isolated session-aware retrieval and async response delivery.
Like Comment
To view or add a comment, sign in
Mahima Sanketh
2w Edited
Report this post
Just shipped mcp-foundry, a hands-on TypeScript monorepo for learning Model Context Protocol (MCP) with Google Gemini. What it covers: → Building MCP servers with custom tool definitions → Connecting an AI agent host via stdio transport → Agentic tool-calling loops with Gemini → Real-world weather API integration with Zod validation MCP is becoming the standard way LLMs interact with external tools — and this repo breaks down exactly how that works under the hood. Open source 👇 https://lnkd.in/gKbWwaxE #MCP #ModelContextProtocol #Gemini #TypeScript #AIAgents #LLM #OpenSource

GitHub - Mahima-Sanketh-Git/mcp-foundry: 🤖 Explore Model Context Protocol (MCP) with Google Gemini. TypeScript monorepo covering MCP servers, clients, tool calling, and agentic AI workflows. github.com
Like Comment
To view or add a comment, sign in
Andreas Knüpfer
3w
Report this post
Tutorial "Large-Scale Data Version Control for HPC and HTC with Git and DataLad" at ISC High Performance in Hamburg in June Do you like git for code? How about git for your large research data as well? We will present how large, binary files are covered which native git cannot handle. And how this works in HPC environments. If it sound's interesting to you, why not spend 1/2 day in the beginning of the ISC-HPC conference to learn about it and to try on one of the #JSC / #FZJ clusters. See https://lnkd.in/dU-rmCsN and https://lnkd.in/dEKXXUrh for more details. #git #DataLad #HPC #SLURM #RDM #tutorial

SCC: ISC-HPC Tutorial https://www.casus.science
Like Comment
To view or add a comment, sign in
Jobin Joseph
4d
Report this post
Your logs say something broke. They won’t tell you why. In a distributed system, a single request touches dozens of services. Logging alone can’t follow that journey. Tracing can. OpenTelemetry gives every request a unique trace ID that travels across every service, every DB call, every queue. When something breaks, you don’t grep through 10 servers. You search one ID and see the entire chain. One standard. Works with any language. Exports to Grafana, Datadog, Jaeger — whatever you use. Instrument once. Never rewrite for a new tool again. If you’re running microservices without distributed tracing — you’re not observing your system. You’re guessing. 📖 Official docs: https://lnkd.in/dmG5n99a 🚀 Getting started: https://lnkd.in/dusZpPAp 🐍 Python specific: https://lnkd.in/dNpHFyyd #OpenTelemetry #Observability #BackendEngineering #Microservices #SoftwareEngineering #DistributedSystems
Like Comment
To view or add a comment, sign in
Yash Pritwani
3w
Report this post
Stop choosing your vector database based on GitHub stars. We benchmarked Qdrant, Milvus, and Weaviate across 50M+ vectors with real production query patterns. The results surprised us. What we tested: Recall@10 at 1M/10M/50M vectors, P99 latency under concurrent load, memory per million vectors, horizontal scaling, and metadata filtering performance. Qdrant delivered the best recall-to-latency ratio. At 50M vectors with scalar quantization it maintained 0.98 recall@10 with P99 under 12ms. The Rust engine consumes 40% less RAM than Milvus. Built-in payload filtering means no separate metadata store. Milvus wins on batch throughput. GPU-accelerated indexing builds IVF_PQ indexes 3x faster. Distributed architecture scales horizontally. The trade-off: you need dedicated etcd and MinIO clusters plus a platform team to manage it all. Weaviate offers the fastest prototype-to-production path. Built-in hybrid search combining BM25 with vector similarity, plus embedded vectorization modules. The cost: 2-3x higher memory consumption than Qdrant. The decision framework: - Qdrant: best performance-per-dollar, real-time search with filtering - Milvus: massive scale (100M+), GPU indexing, dedicated platform team - Weaviate: developer experience first, built-in hybrid search Pattern we see repeatedly: teams start with Weaviate for speed, migrate to Qdrant at scale. Build your abstraction layer early to save weeks of migration pain later. The vector database you choose today sets your AI infrastructure ceiling for two years. Decide based on benchmarks, not hype. Read the full breakdown: https://lnkd.in/g7Z4ryHK #VectorDatabase #Qdrant #Milvus #Weaviate #AIInfrastructure #MLEngineering #RAG #SimilaritySearch
Like Comment
To view or add a comment, sign in
Yash Pritwani
3w
Report this post
Stop choosing your vector database based on GitHub stars. We benchmarked Qdrant, Milvus, and Weaviate across 50M+ vectors with real production query patterns. The results surprised us. What we tested: Recall@10 at 1M/10M/50M vectors, P99 latency under concurrent load, memory per million vectors, horizontal scaling, and metadata filtering performance. Qdrant delivered the best recall-to-latency ratio. At 50M vectors with scalar quantization it maintained 0.98 recall@10 with P99 under 12ms. The Rust engine consumes 40% less RAM than Milvus. Built-in payload filtering means no separate metadata store. Milvus wins on batch throughput. GPU-accelerated indexing builds IVF_PQ indexes 3x faster. Distributed architecture scales horizontally. The trade-off: you need dedicated etcd and MinIO clusters plus a platform team to manage it all. Weaviate offers the fastest prototype-to-production path. Built-in hybrid search combining BM25 with vector similarity, plus embedded vectorization modules. The cost: 2-3x higher memory consumption than Qdrant. The decision framework: - Qdrant: best performance-per-dollar, real-time search with filtering - Milvus: massive scale (100M+), GPU indexing, dedicated platform team - Weaviate: developer experience first, built-in hybrid search Pattern we see repeatedly: teams start with Weaviate for speed, migrate to Qdrant at scale. Build your abstraction layer early to save weeks of migration pain later. The vector database you choose today sets your AI infrastructure ceiling for two years. Decide based on benchmarks, not hype. Read the full breakdown: https://lnkd.in/g7Z4ryHK #VectorDatabase #Qdrant #Milvus #Weaviate #AIInfrastructure #MLEngineering #RAG #SimilaritySearch
Like Comment
To view or add a comment, sign in
Anurag Sharma
2w
Report this post
Nobody told me research papers were this useful when I started my SWE journey. Most engineers read blogs, watch tutorials, and skim docs. But the top 1% engineers who built the systems we all use every day? They published their thinking. And it's all out there, for free. Here are 5 research papers every software engineer should read: 📄 1. 𝘋𝘺𝘯𝘢𝘮𝘰: 𝘈𝘮𝘢𝘻𝘰𝘯'�� 𝘏𝘪𝘨𝘩𝘭𝘺 𝘈𝘷𝘢𝘪𝘭𝘢𝘣𝘭𝘦 𝘒𝘦𝘺-𝘝𝘢𝘭𝘶𝘦 𝘚𝘵𝘰𝘳𝘦 (2007) The paper that gave birth to eventual consistency and consistent hashing. If you've used Redis, Cassandra, or DynamoDB — this is the origin story. 📄 2. 𝘛𝘪𝘮𝘦, 𝘊𝘭𝘰𝘤𝘬𝘴, 𝘢𝘯𝘥 𝘵𝘩𝘦 𝘖𝘳𝘥𝘦𝘳𝘪𝘯𝘨 𝘰𝘧 𝘌𝘷𝘦𝘯𝘵𝘴 𝘪𝘯 𝘢 𝘋𝘪𝘴𝘵𝘳𝘪𝘣𝘶𝘵𝘦𝘥 𝘚𝘺𝘴𝘵𝘦𝘮 (1978) Leslie Lamport asked: "What does 'happened before' even mean across machines?" The answer changed distributed computing forever. Lamport clocks are still everywhere. 📄 3. 𝘈𝘵𝘵𝘦𝘯𝘵𝘪𝘰𝘯 𝘐𝘴 𝘈𝘭𝘭 𝘠𝘰𝘶 𝘕𝘦𝘦𝘥 (2017) The Transformer paper. The architecture behind every LLM you've interacted with — GPT, Gemini, Claude — traces back to this one paper from Google Brain. 📄 4. 𝘛𝘩𝘦 𝘎𝘰𝘰𝘨𝘭𝘦 𝘍𝘪𝘭𝘦 𝘚𝘺𝘴𝘵𝘦𝘮 (2003) GFS rewrote the rules for large-scale storage. It directly inspired HDFS and taught the world to treat hardware failure as the norm, not the exception. 📄 5. 𝘙𝘢𝘧𝘵: 𝘐𝘯 𝘚𝘦𝘢𝘳𝘤𝘩 𝘰𝘧 𝘢𝘯 𝘜𝘯𝘥𝘦𝘳𝘴𝘵𝘢𝘯𝘥𝘢𝘣𝘭𝘦 𝘊𝘰𝘯𝘴𝘦𝘯𝘴𝘶𝘴 𝘈𝘭𝘨𝘰𝘳𝘪𝘵𝘩𝘮 (2014) The consensus algorithm powering etcd, which powers Kubernetes. If you work with K8s, this paper explains what's keeping your control plane alive. Remember this:- 𝙔𝙤𝙪 𝙙𝙤𝙣'𝙩 𝙣𝙚𝙚𝙙 𝙖 𝙋𝙝𝘿 𝙩𝙤 𝙧𝙚𝙖𝙙 𝙩𝙝𝙚𝙨𝙚. 𝙔𝙤𝙪 𝙣𝙚𝙚𝙙 𝙘𝙪𝙧𝙞𝙤𝙨𝙞𝙩𝙮 𝙖𝙣𝙙 𝙖𝙣 𝙖𝙛𝙩𝙚𝙧𝙣𝙤𝙤𝙣. The engineers who wrote them were solving real problems under real constraints. Reading their thinking is one of the fastest ways to upgrade yours. Save this post. Pick one paper this weekend. 🚀 #SoftwareEngineering #DistributedSystems #MachineLearning #CareerGrowth #Tech

1 Comment
Like Comment
To view or add a comment, sign in
Muhammad Naveed Qasim
3w
Report this post
From Spawning Goroutines to Architecting Systems: My Journey Through Go Concurrency Most developers can write a "go" statement, but few can orchestrate a concurrent system that doesn't crash under pressure. Over the last three weeks, I have intentionally pushed my understanding of Golang from basic implementation to intermediate-level concurrency patterns, focusing on system safety, scalability, and performance. During this intensive period, I moved beyond theory by building and documenting several simulation projects in my repository: Sentinel-Go: A high-frequency health and performance monitor. Concurrent Word Counter: A modular, multi-file processor using thread-safe state management. Web Crawler & Link Validator: A recursive, high-performance engine featuring a URL Frontier and Semaphore patterns to prevent resource exhaustion. My progress involved mastering three distinct layers of engineering: Foundations: Deep-diving into the G-M-P scheduler and the mechanics of context switching. Core Primitives: Implementing robust Worker Pools, Pipelines, and Fan-in/Fan-out patterns using channels and sync.WaitGroup. Synchronization: Eliminating data races by using sync.Mutex, RWMutex, and Atomics, all verified through the Go Race Detector. With these foundations solidified, I am now moving into advanced concurrent concepts, including Context-driven cancellations and graceful shutdowns. You can view the architectural breakdown and the full source code for these projects here: https://lnkd.in/dPWacxRM #Golang #Concurrency #BackendEngineering #SoftwareArchitecture #SystemDesign #DistributedSystems #GoProgramming

GitHub - MRNaveed-stack/mastering-go-concurrency: The Ultimate Go Concurrency Lab: An end-to-end exploration of the Go memory model. From basic Goroutines and synchronization primitives to advanced orchestration patterns, CSP (Communicating Sequential Processes) principles, and high-pressure stress testing. github.com
Like Comment
To view or add a comment, sign in
Tushar Deshpande
6d
Report this post
Most RAG tutorials present a single retrieval strategy, but production systems require the ability to choose the right one and know when to switch. I developed rag-nexus to address this need. It is a production-grade RAG system that implements six complete strategies side-by-side, allowing you to benchmark and compare them on your own data: → Naive RAG — embed, retrieve, generate. The baseline. → Advanced RAG — query rewriting, hybrid retrieval, and cross-encoder reranking. → Self-RAG — the model reflects on its own outputs and flags low-confidence answers. → CRAG (Corrective RAG) — falls back to web search when retrieval confidence is low. → Adaptive RAG — routes each query to the appropriate strategy based on complexity. → Agentic RAG — ReAct-style multi-step reasoning with tool use. Under the hood, the system features: • Event-driven ingestion via Redis Streams and Celery workers. • Dense (Qdrant), Sparse (BM25), and Hybrid RRF retrieval. • HyDE, step-back prompting, multi-query expansion, and query decomposition. • RAGAS evaluation suite built in — assessing faithfulness, answer relevancy, and context precision. • JWT authentication, Redis caching, rate limiting, Prometheus, Grafana, and OpenTelemetry tracing. • One-command Docker Compose startup. The goal was not to identify a single winner but to create a system where you can integrate real workloads and let the benchmarks reveal the best approach. Code and documentation can be found at https://lnkd.in/gje76kGv. #RAG #LLM #AIEngineering #Python #LangChain #OpenSource #BackendEngineering #GenerativeAI

GitHub - TushGH/rag-nexus: Production-grade RAG system — 6 strategies, event-driven ingestion, Prometheus, Docker github.com
Like Comment
To view or add a comment, sign in

11,020 followers

View Profile Connect

1,100+ Researchers Clear H100 Barrier with Frictionless Access

More from this author

Building for resilience: Runpod’s response to the AWS us-east-1 outage

Explore content categories