Sign in to view Reynold’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Sign in to view Reynold’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
San Francisco, California, United States
Sign in to view Reynold’s full profile
Reynold can introduce you to 10+ people at Databricks
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
38K followers
500+ connections
Sign in to view Reynold’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Reynold
Reynold can introduce you to 10+ people at Databricks
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Reynold
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Sign in to view Reynold’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Activity
38K followers
-
Reynold Xin shared thisThe future of databases is being built directly on top of object stores. We call this the Lakebase architecture. For a long time, the industry treated data lakes strictly as analytical or offline storage. But the Lakebase architecture is changing that by enabling true operational databases directly on top of the lake. I believe this is the future of data infrastructure. It is how every database, whether it's an OLTP system or a vector database, should be built moving forward. Of course, delivering the stringent performance requirements for operational databases on top of object stores require some creative engineering. Really excited to see more real-world examples of this architecture emerging. The team at Zilliz just shared a piece on why they rebuilt their vector database using this exact approach, and it perfectly captures where the industry is heading. Check it out here: https://lnkd.in/gKxY3bHXWhy We Built Vector Lakebase: Rethinking Unstructured Data Architecture for AI - Zilliz blogWhy We Built Vector Lakebase: Rethinking Unstructured Data Architecture for AI - Zilliz blog
-
Reynold Xin shared thisOracle has spent the last two weeks writing articles comparing Oracle (and PDB) to Lakebase, and it highlights a massive philosophical divide in how we view databases in the agentic era. They are trying to retrofit heavy, traditional architectures for AI. We believe Lakebase are the future because agents need something entirely different: ⚡️ Super simple APIs: so agents don't have to read a giant manual and hallucinate a query. ⚡️ Sub-second provisioning & auto-scaling: so you aren't paying legacy-level prices for idle time. ⚡️ Branching: Git-style branching to create isolated, safe environments for agents on the fly. ⚡️ Automatic backup & restore: so you don't sweat it when an autonomous agent inevitably drops a table. The numbers speak for themselves. Lakebase is our fastest growing product. In the last few months alone, we've seen database start rate 30X, and now we are starting tens of millions of databases EVERY DAY. Some of these databases have 500 level deep branches and lifetime of just seconds due to how fast agents move. Go try it yourself in a few seconds on neon.com! The team has been cooking hard to push this gap even further. Come to Data and AI Summit next month to hear about some major new breakthrough capabilities. 🚀 (Links in comments so you can read their take)
-
Reynold Xin reposted thisReynold Xin reposted thisExciting news! 🎉 Lovable now integrates with Databricks, providing a natural language interface that allows anyone, regardless of technical skills, to build live data apps can read and write data stored in Databricks. Bridge the gap between complex data engineering and beautiful, functional front-ends. I tested it with SEC filing data. You can go from search to analyzing detailed financial statements in minutes. Try it yourself! #databricks #lovable
-
Reynold Xin reposted thisReynold Xin reposted thisExcited to announce GPT-5.5, our smartest and most intuitive model yet! 5.5 now understands what you mean, not just what you type – perfect for handling ambiguity, planning multi-step work, and actually helping complete tasks. Earlier models already did this well, but 5.5 moves further toward understanding what you’re trying to accomplish, especially at work where things aren’t perfectly packaged. From my conversations with hundreds of enterprises, this is the type of model capability they look for as they make AI a core intelligence layer of their infrastructure at companies like NVIDIA, Lowe's Companies, Inc., Cisco, BNY, and more. To celebrate today’s launch, Patrick Wendell at Databricks and I had a chance to sit down and talk about GPT-5.5’s impact on joint customers, and how builders and non-technical teams can harness the power of this latest model directly in Databricks’ platform. Congrats to all the research and deployment teams for this amazing milestone!
-
Reynold Xin shared thisMatei just received the ACM Prize in Computing, one of the most prestigious awards in computer science. It's hard to think of anyone who deserves it more. Few people have had as much impact on how the world works with data and AI, and Matei has done it all with a focus on building open source tools that are accessible to researchers, nonprofits and enterprises across all industries. Watching his body of work compound over the years is a privilege. Congrats, Matei!Reynold Xin shared thisWe're incredibly proud to congratulate our co-founder and CTO, Matei Zaharia, on receiving the ACM Prize in Computing for his development of distributed data systems that have enabled large-scale machine learning, analytics, and AI. Matei's open-source contributions have fundamentally changed how organizations work with data and AI — including Apache Spark™, Delta Lake, and MLflow. Researchers, nonprofits, startups, and enterprises across every industry have built on the foundation he helped create. Now he's pushing the frontier further, focusing on building and scaling reliable AI agents through open-source research like DSPy and GEPA. Matei, this recognition is so well deserved. We're honored to build alongside you every day. https://lnkd.in/gZTw65kW
-
Reynold Xin reposted thisReynold Xin reposted thisWe're incredibly proud to congratulate our co-founder and CTO, Matei Zaharia, on receiving the ACM Prize in Computing for his development of distributed data systems that have enabled large-scale machine learning, analytics, and AI. Matei's open-source contributions have fundamentally changed how organizations work with data and AI — including Apache Spark™, Delta Lake, and MLflow. Researchers, nonprofits, startups, and enterprises across every industry have built on the foundation he helped create. Now he's pushing the frontier further, focusing on building and scaling reliable AI agents through open-source research like DSPy and GEPA. Matei, this recognition is so well deserved. We're honored to build alongside you every day. https://lnkd.in/gZTw65kW
-
Reynold Xin reposted thisReynold Xin reposted thisThe Databricks Lakebase engineering team just shipped a real step toward invisible maintenance. For Lakebase Postgres, a new compute node is brought up ahead of a scheduled update and prewarmed using the primary’s cache footprint and WAL stream. When it is ready, it takes over. No cold cache, no throughput drop, no disruption to the workload. This is fundamental systems work. Stateless compute, shared storage, and being precise about what to warm and when. It removes a failure mode most databases still expose during routine patching. Well done Hans Norheim and the entire Lakebase eng team who contributed to this milestone. 🧱 🔥 Check out how we did it: https://lnkd.in/e7KPYyWR
-
Reynold Xin shared thisThe dynamics of cybersecurity defense are changing rapidly. Attackers are using AI agents that don't sleep, can connect the dots faster than humans, and can exploit almost any surface area. Now, even someone without a technical background can deploy a team of agents to mount an attack. If defenders want to level the playing field, we have to change our approach: we need to collect and analyze ALL data rather than relying on selective filtering. We need an open ecosystem. And most importantly, we need to fight agents with agents. Today, we are taking a massive step in that direction by announcing Lakewatch, the open agentic SIEM. Unlike traditional SIEMs, Lakewatch empowers organizations to store and analyze ALL of their data at scale and at a low cost, running advanced detections and analysis with AI agents. Building this wouldn't have been possible without our amazing design partners. We’ve been working with National Australia Bank (NAB) since the very beginning. In our first collaboration meeting, I remember the NAB team bringing up the exact threats of the AI era and the necessity of an open lakehouse architecture. In fact, Patrick Wright, Sandro Bucchianeri, and Rob Smith practically pitched our own vision to us before we even told them what we were building! A huge thank you to these visionaries. We are so lucky to have you on the Lakewatch journey. Check out the blog to see how we are building the future of security together: https://lnkd.in/gZiZPb3EBuilding the future of security with NAB with LakewatchBuilding the future of security with NAB with Lakewatch
-
Reynold Xin reposted thisReynold Xin reposted thisDatabricks CEO and co-founder Ali Ghodsi is taking the stage at #RSAC2026 to address the most critical challenge in security: the shift from manual operations to machine-scale AI automation. Hear why today's security stack — built for yesterday's data volumes and human-speed threats — is no longer enough. Ali will break down what's coming next and why an open, agent-first architecture is the path forward for CISOs. https://lnkd.in/gC4E2wm7
-
Reynold Xin liked thisReynold Xin liked thisI'm excited to share that I've been elected as a Committer for Apache Spark. Spark has impressed me for as long as I can remember, both in its technical innovation and the community behind it. It's been a privilege to contribute back to a project that has had such a profound impact on the data ecosystem, and I hope I can help it continue to succeed for years to come. I'm especially grateful for the support, feedback, and collaboration from the Spark community and my teammates at Apple and then Databricks. A special thank you to everyone who reviewed my contributions, provided guidance, and helped me learn along the way! #ApacheSpark #OpenSource #ApacheSoftwareFoundation #BigData #DataEngineering
-
Reynold Xin liked thisReynold Xin liked thisSome exciting news from the mathematics chapter of my life! Our paper, “Guts in sutured decompositions and the Thurston norm”, has been accepted for publication in Geometry & Topology, one of the most prestigious journals in geometry and topology. I spent two years at UC Berkeley earning a PhD in mathematics, studying 3-manifolds, geometric topology, and hyperbolic geometry. This paper, written with my advisor Ian Agol, is one of the few papers that came out of that period. Research often moves on a very different timescale than startups. It’s been years since this work was done, but seeing it finally reach publication makes the journey especially meaningful. I’m deeply grateful to Ian for his support, mentorship, and countless mathematical conversations over the years. It’s nice to be reminded that even in the age of AI, beautiful mathematics remains timeless.
-
Reynold Xin liked thisReynold Xin liked thisIt is fantastic to see #uoft alumni Reynold Xin and Mike Murchison being featured during #TorontoTechWeek, via University of Toronto Entrepreneurship's Desjardins Speaker Series, to discuss their journeys building successful tech startups. What an incredible opportunity to hear from these brilliant founders about lessons learned and seizing the AI moment.
-
Reynold Xin liked thisReynold Xin liked thisEarlier today Eduardo Gomes, Gaurang Joshi, Karl Pullicino, and I had the privilege of sitting down with Reynold Xin, Co-founder and Chief Architect at Databricks. What stood out wasn't just the depth of technical insight, but how openly Reynold shared his perspective on where the industry is heading, and how thoughtfully he engaged with us on our priorities and use cases. Nearly five years ago, a similar conversation gave us the conviction to embrace the Lakehouse paradigm and begin our Data Modernization journey with Databricks. Today's discussion left us with the same clarity and confidence about what comes next, a clear sense of how the Databricks platform is constantly evolving, and the reassurance that we've chosen the right partner to get us there. A special shoutout also goes to Kasthuri Thambipillai and Mifrah K., not just for making conversations like this possible, but for their ongoing support with our initiatives. Looking forward to the upcoming Data + AI Summit, and the exciting announcements that come with it. 🚀 #AI #BallysIntralot #Data #Databricks #DAIS #Innovation
-
Reynold Xin liked thisReynold Xin liked thisThrilled to share I've started at Stanford University's Department of Pathology in addition to Arc Institute. Looking forward to a shorter commute after 5 years at University of California, Berkeley's Department of Bioengineering and embarking on daring new projects. We're recruiting multiple postdocs and technical staffwho share our vision of programming biology and building tangible products that impact everyday lives in the real world! - Lab website: https://lnkd.in/ggTqJEgh - Machine learning for biology postdoc: https://lnkd.in/g2vuPUzv - Biological design postdoc: https://lnkd.in/gW7UEqZ2 - Molecular technology development scientist: https://lnkd.in/gfabFssU - Biochemistry scientist: https://lnkd.in/gM5znDUi We are a hybrid lab of experimental and computational scientists working across bioengineering and machine learning. Recent work has included genome foundation models, metagenomic mining, DNA recombinases, AI-designed molecular systems, ML-guided directed evolution, and perturbation prediction Lately, we have been thinking about the following research frontiers (your innovative ideas could be related, or broadly fit into our thematic interests): - AI agents and virtual cell models for scientific research and drug discovery - Discovery and engineering of modulators/probes/drug-like molecules for physiological control and human enhancement (e.g. sleep, appetite, energy, etc) - Immune rejuvenation and skin/microbe/barrier tissue rejuvenation for aging-related phenotypes - New programmable molecular tools for synthetic biology and AI-guided perturbations of cellular behavior
Experience & Education
-
Databricks
*********
-
********** ** *********** ********
****** ** ********** ******* ******** ******* undefined
-
-
********** ** *******
**** *********** *******
View Reynold’s full experience
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
View Reynold’s full profile
-
See who you know in common
-
Get introduced
-
Contact Reynold directly
Other similar profiles
Explore more posts
-
Andrew Martinez
Alpha Wave Global, LP • 2K followers
My thoughts on the recent Cerebras & OpenAI partnership Inference speed at scale is a core bottleneck for AI. The partnership between Cerebras and OpenAI signals a shift many haven’t internalized: once models are good enough, latency and throughput under concurrency become the binding constraints, not model quality. At scale, inference is the product. Every user interaction, agent step, and reasoning loop is gated by P99 latency under load, not by how smart the model is in isolation. Milliseconds compound into user experience, developer velocity, and revenue. This isn’t about cheaper inference. It’s about faster inference unlocking new economics: real-time agents, deeper reasoning chains, and high utilization without tail-latency blowups. OpenAI isn’t just securing capacity. They’re prioritizing architectures where time-to-answer is a first-class design variable. When intelligence commoditizes, time becomes the scarce resource. Yuval Rozio David Fabregat Tristan Capes-Davis https://lnkd.in/exfBGZzs
30
-
Anshul Kapoor
Google • 4K followers
For years, the story was: TPUs for unmatched performance, but with a steeper learning curve. The new vLLM TPU backend just rewrote that story. By unifying the PyTorch and JAX ecosystems, we've removed one of the biggest barriers to adoption. Now, the massive vLLM community can tap into the raw power and efficiency of TPUs without the heavy lift. This is more than a performance boost; it's about democratizing access to hyperscale AI infrastructure. A huge step forward for the entire open-source community. #AI #LLM #TPU #GoogleCloud #vLLM #OpenSource #GenAI https://lnkd.in/e6RzQRuV
21
-
Bunty Shah
MSCI Inc. • 4K followers
[AI paper] GRPO has a hidden flaw in Multi-Reward settings. It’s time to decouple your normalization. 📉 As AI Architects, we are moving from single-objective RL (just "get the answer right") to multi-objective RL (accuracy + format + length + safety). We typically sum these rewards and throw them into GRPO. A new paper from NVIDIA, "GDPO: Group reward-Decoupled Normalization Policy Optimization," demonstrates that this naive summation causes Reward Signal Collapse. The Architectural Failure Mode: If you sum disparate rewards (e.g., a binary Format reward and a scalar Accuracy reward) and then normalize, different raw reward combinations can map to identical advantage values. Example: A rollout with rewards (0, 2) might yield the same normalized advantage as (0, 1) due to group statistics, effectively deleting the gradient signal for the second objective . The Solution: GDPO The fix is architectural: Decoupled Normalization. Instead of normalizing the sum, GDPO normalizes each reward objective independently within the group before aggregation. Impact: This restores the resolution of the training signal. On Tool Calling and Math Reasoning tasks (DeepSeek-R1/Qwen), GDPO converges where GRPO fails or plateaus. If you are training agents that must balance strict formatting (JSON) with reasoning quality, this is the correct objective function to use. 👇 Link to the paper in the comments. #AIArchitecture #RLHF #NVIDIA #GRPO #DeepLearning #LLM #Alignment #Research
21
1 Comment -
Chuan Qiu
Velda • 2K followers
xAI is sharing spare GPU capacity with Anthropic. Their cluster utilization was reportedly running at 11%. The industry average isn't much better — 40%. This isn't an xAI problem. It's a structural one. Most clusters are reserved by a single tenant. But compute demand from any one team is inherently uneven: quiet during planning cycles, maxed out before a model deadline. The gap between those two states gets eaten by idle GPUs no one is using, while teams without reservations queue for hours to get a single node. Velda is making compute fluid. If you reserve GPUs with us, your unused capacity doesn't go to waste: it converts automatically into spot credits you can draw on later. You will have prioritized access to the spot pool, which comes from unused compute of reservations that would be otherwise wasted. Velda allows you to run GPU jobs with a simple command prefix; Preempted jobs retry automatically from last check-point. Getting cluster access should feel like using your workstation, no manifest, no extra setup. And we're not going to mark up the hardware. You pay what the GPU provider charges, no platform tax on top. If you have an existing provider relationship, we'll work with them directly. The capacity is there. It just needs to move.
23
6 Comments -
Shishir Kumar Prasad
Instacart • 3K followers
Proud to share our latest work on PARSE — Instacart’s self-serve platform for product attribute extraction using multi-modal LLMs (text + image). From identifying “80 sheets” on packaging to reasoning over “3 boxes of 124 tissues,” PARSE brings accuracy, coverage, and speed to our catalog systems at scale. Key highlights: - Multi-modal LLMs for robust attribute extraction - Zero/few-shot setup with confidence scoring - 70% cost savings on simpler attributes - Built-in quality review, versioning, and ongoing automation Huge kudos to the entire catalog team and our cross-functional partners 👏 .
35
-
Nils Matteson
thaw • 1K followers
Presented at the ML+X community meeting at UW-Madison today on deploying a retrieval-augmented generation (RAG) system on AWS Bedrock and evaluating LLM performance at scale. Our team took KohakuRAG, the #1 solution from the 2025 WSDM WattBot Challenge, and deployed it as a fully serverless pipeline on AWS Bedrock. I built the Bedrock integration, evaluation framework, and cost tracking infrastructure, then benchmarked 9 LLMs and 3 ensemble strategies across 282 questions to answer a practical question: which model should you actually deploy in production? A few results that stood out... - Ensemble majority voting (0.840) outperformed every individual model. - Llama 4 Maverick delivered 98% of the top score at a fraction of the cost and latency. - Model behavior matters as much as raw accuracy; our highest-citation model finished last overall due to aggressive refusal behavior. - Text-only embeddings create a ceiling on figure-based questions that no LLM can overcome. The full recording will be available on ML+X Nexus: (https://lnkd.in/gSTe_EdW) soon. I would definitely recommend watching the whole thing if you are interested in deploying or learning more about RAG applications! Update: Now live at https://lnkd.in/gv7_jHkv Thank you to Christopher Endemann for the mentorship and for bringing me onto this project; I learned more about applied ML infrastructure than I have in any course in just a few months. Thank you to Blaise Manga Enuh, Ph.D. for the collaboration on the local deployment pipeline.
21
4 Comments -
Henry Peter
Ushur • 2K followers
Agents don’t flow — they live. I keep seeing new “agent” frameworks that simply repackage workflows — connecting steps in a DAG (Directed Acyclic Graph) and calling it autonomy. The latest example: OpenAI’s AgentKit announced at Dev Day → https://lnkd.in/g7FCW6Mt But autonomy isn’t something you add to a workflow — it’s something that defines an agent. True agents don’t flow from one step to the next. They exist in environments — perceiving, deciding, and acting continuously based on what they know, sense, and aim to achieve. When we treat agents as workflow nodes, we lose what makes them agentic in the first place: persistence, specialization, and emergent coordination. Sustainable architectures won’t come from orchestrated flows dressed up as agents. They’ll come from systems where intelligence arises from interaction, not instruction. (Not a critique of any vendor — just a reminder that autonomy isn’t a feature you wire in; it’s the nature of an agent itself.) #AI #Agents #SystemsArchitecture #Autonomy #MultiAgentSystems #LLM #EmergentIntelligence #AgenticAI #Innovation #DesignPhilosophy
59
-
Shabnam Rashtchi, Ph.D.
GSK • 2K followers
My Take on “Agentic” Frameworks and Real AI Engineering: There’s a lot of excitement around “agentic” frameworks, OpenAI’s AgentKit, LangChain, LlamaIndex, and others, all promising to simplify how we build intelligent systems. They’re great for quick prototypes, but when you’re designing production AI products that have to last, the picture changes. To me, an agent isn’t a visual node or a black-box workflow, it’s simply a modular, testable class that performs a defined task using an LLM or transformer model under strict control. In other words, it’s just good software design: clear inputs and outputs, deterministic orchestration, and measurable results. I prefer to build hybrid systems that combine structured engineering with state-of-the-art NLP and transformer architectures, using the LLM as a controlled component, not the entire system. That keeps the logic transparent, debuggable, and adaptable as models and requirements evolve. Frameworks like AgentKit can still help at the edges, for evaluation, UI, or quick orchestration, but the core intelligence and logic should always live in code you own and understand. That’s how you build AI that’s not only powerful today, but maintainable tomorrow. #AI #MachineLearning #LLM #SoftwareEngineering #NLP #ProductDesign
49
4 Comments -
Ashok Kumar Singh
MCAL Global • 11K followers
The GPU budget isn't what's killing your AI ROI. It's the silent compute burn happening before the model even touches the data. 🛑 Everyone is obsessing over the cost of model training, but here is the unpopular engineering truth: for most enterprise AI projects, data preprocessing consumes significantly more compute than actual model training. Heavy workloads like deduplication, normalization, and label cleaning are massive cost centers. Yet, while engineering teams meticulously track GPU hours, almost nobody instruments their data pipelines. You cannot optimize what you do not measure. If you lack granular visibility into your preprocessing compute, you are flying completely blind on a cost center that can easily exceed your entire training budget. Stop bleeding compute in the dark. Instrument your data pipeline. Take control of your AI unit economics today at https://zurl.co/bieZo. 👇 #DataEngineering #FinOps #EnterpriseAI
6
-
Mike Bell
Synapsa • 5K followers
Wondering when this was coming. We've been using AWS Bedrock for all Anthropic models since the beginning. Lower latency. More secure. Favorable cost. Also just recognizing they don't have the infra (yet) to compete at the scale they need. Also also the fact that they are saying, "yeah sure we don't care...run our models from anywhere and we'll build the tools that know how to use them best" is brilliant.
5
5 Comments -
Chenxi Wang, Ph.D.
Rain Capital • 29K followers
Lovable hit $100 million ARR 8 months from launch. Anysphere's Cursor took 12 months to hit $100 million Before that, It took Wiz 18 months to reach $100M And Deel 20 months Ramp 24 months All these companies exhibited #escape #velocity. If you look at how they conducted business, these companies focused on #distribution early on in the company's journey. "The game of #startups is if the startup can gain #distribution faster than the incumbent gain #innovation ", as Alex Rampell wrote in his 2015 blog post Alex Rampell in his blog post “Distribution vs Innovation” As Lovable's stats demonstrate, the escape velocity is becoming larger and more aggressive, particularly with #AI. Will we see companies reaching $100M ARR in 6 months? Will we see solopreneur companies achieving the same escape velocity? The question for other #startups, those that funded but not achieving quite the same escape velocity -- what are you doing to steer yourself onto the escape velocity path? As Brian Balfour wrote in this excellent blog: "The Big Squeeze: Why Escape Velocity Is More Important Than Ever"-- #Distribution is the seed that unlocks harvest (I recommend every startup founder read this blog, link in the comments) - Who can you integrate with? - Which channels can you attain distribution faster than others? - Who should you partner with to give you that market #uplift? #distribution is indeed more important than ever. If you are a startup and you are not asking these questions from day 1, you will be left behind.
62
13 Comments -
Jacob Warren
Rig AI • 7K followers
You don't need a separate reference model for RL. We built an RL pipeline built on a multi-turn-focused modification of Self-Distillation Policy Optimization (SDPO) that uses the same model in two roles: student and teacher. The difference? Context window. → The student generates actions with no knowledge of the outcome. The teacher re-evaluates the same tokens with hindsight: the observation from each action, terminal test output, and a successful rollout from the same prompt. The per-token advantage is just the logit gap between these two views. → This gives you dense, per-token credit assignment instead of a single scalar reward for the whole trajectory. Think tokens get 0.25× weight (loosely guided, not tightly constrained). Action tokens get full gradient. System/user/tool result tokens get nothing. → The "teacher" is not a frozen checkpoint or a separate reward model. It's your current policy, conditioned on richer information. This is well-defined under MoE because same parameters with different routing is normal operating behavior. → We use a tiered reward system with hard gates only for truly unrecoverable failures: sandbox escapes, test corruption, harness crashes, timeouts, infinite loops. Everything else like wrong directory, failed commands, incorrect approaches are learnable signal, not wasted compute. The key insight: most of the information you need for credit assignment is already inside the model. You just need to ask it the right question with the right context. On LiveCodeBench v6 SDPO reached 48.8% vs GRPO's 41.2% beating Claude Sonnet 4 with an 8B model! That's bonkers.
5
-
Andrei Lopatenko
Govini • 26K followers
Quite an interesting (though relatively short) article on building vector search systems in real-world production settings. It describes an infrastructure operating at scale, around 100B vectors (200 TiB of vector data), with a p99 latency target of ~200ms and a relatively modest ~1K QPS, which still makes the system highly non-trivial. What stands out is the range of engineering techniques required to make this work in practice, including hierarchical clustering, binary quantization, multi-stage ANN pipelines, and other aggressive compression and pruning strategies (We,p eople who build search engines know very well how much engineering effort these systems require, and at the same time how substantial the advantages can be once they are working at scale.). The system design is clearly driven by the constraints of scale, where every optimization matters. More broadly, it reflects a wider 2026 trend, large-scale vector search is becoming a core infrastructure primitive. While earlier systems focused mostly on RAG over relatively small or curated corpora, today’s deployments operate at tens or even hundreds of billions of vectors, with real-world latency requirements (often ~200ms p99) and increasingly high throughput demands (from 1K QPS in some systems up to 100K–1M QPS in others). The key takeaway is that ANN systems are now treated as first-class infrastructure, where even single-digit percentage improvements in index size, latency, or throughput translate into significant cost and scalability advantages. https://lnkd.in/eSFcpdPZ
62
1 Comment -
Daren Martin
Datadog • 3K followers
AI is moving fast—but the real challenge isn’t adoption, it’s operation. Monitoring, evaluating, and controlling cost and latency are what turn AI from an experiment into something production-ready. The teams that get this right won’t just build AI systems—they’ll run them reliably at scale.
13
-
Jan Beitner, PhD
Inflexion • 3K followers
#AI coding #agents are changing how I evaluate vendors. A strong argument for products is if they can be configured purely through code and #API / #MCP. Not because they have their own AI features - but because they let your AI agent do the work. Lightdash is a great example. It is not the most mature BI tool, but probably one of the most AI-ready - because they made all data and configuration accessible via API and gave AI agents the context to act on it, so Claude Code can end-to-end create dashboards by modelling data in the data warehouse and then build the visualizations. If your product is not API-first, AI cannot bring in outside context or act autonomously within it. That is becoming a real competitive disadvantage. https://lnkd.in/e6Gp3MGj
53
14 Comments -
Niall Murphy
6K followers
YellowDog.ai just set a 10x benchmark uplift in scale computing, delivering 40,000 tasks per second (TPS) and managing 100,000 compute nodes in the cloud. What's even more interesting, that's 2x IBM Symphony and opens an intriguing pathway for these until-know closed/captive systems.
11
-
Antonio Gulli
Google • 75K followers
this is super important because TPUs have definitely advantages for cost and performance over other accelators but were much more difficult to use. no longer the case. "vLLM TPU is now powered by tpu-inference, an expressive and powerful new hardware plugin unifying JAX and PyTorch under a single lowering path. It is not only faster than the previous generation of vLLM TPU, but also offers broader model coverage and feature support." https://lnkd.in/diDKzYbS
35
1 Comment -
Andrew Ng
DeepLearning.AI • 3M followers
Introducing "Building with Llama 4." This short course is created with Meta, and taught by Amit Sangani, Director of Partner Engineering for Meta’s AI team. Meta’s new Llama 4 has added three new models and introduced a Mixture-of-Experts (MoE) architecture to its family of open-weight models, making them more efficient to serve. In this course, you’ll work with two of the three new models introduced in Llama 4. First is Maverick, a 400B parameter model, with 128 experts and 17B active parameters. Second is Scout, a 109B parameter model with 16 experts and 17B active parameters. Maverick and Scout support long context windows of up to a million tokens and 10M tokens, respectively. The latter is enough to support directly inputting even fairly large GitHub repos for analysis! In hands-on lessons, you’ll build apps using Llama 4’s new multimodal capabilities including reasoning across multiple images and image grounding, in which you can identify elements in images. You’ll also use the official Llama API, work with Llama 4’s long-context abilities, and learn about Llama’s newest open-source tools: its prompt optimization tool that automatically improves system prompts and synthetic data kit that generates high-quality datasets for fine-tuning. If you need an open model, Llama is a great option, and the Llama 4 family is an important part of any GenAI developer's toolkit. Through this course, you’ll learn to call Llama 4 via API, use its optimization tools, and build features that span text, images, and large context. Please sign up here: https://lnkd.in/gXKeipht
2,830
111 Comments
Explore top content on LinkedIn
Find curated posts and insights for relevant topics all in one place.
View top content