Sign in to view Eyal’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Sign in to view Eyal’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Cambridge, England, United Kingdom
Sign in to view Eyal’s full profile
Eyal can introduce you to 10+ people at Nscale
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
2K followers
500+ connections
Sign in to view Eyal’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Eyal
Eyal can introduce you to 10+ people at Nscale
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Eyal
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Sign in to view Eyal’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
About
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
Services
Activity
2K followers
-
Eyal Lantzman reposted thisEyal Lantzman reposted thisBecome a part of our product team at Nscale 🚀 I'm looking for a product manager with K8s experience based in Europe/UK 🇪🇺 🇬🇧 Feel free to reach out if you have questions ☺️
-
Eyal Lantzman shared thisCome and join our team to build Nscale cloud AI services in Singapore!Eyal Lantzman shared this🚀 We’re Hiring: AI Product Engineer (Singapore) Exciting times at Nscale — we’re on the lookout for a talented AI Product Engineer to join our team and help shape the future of AI-powered products. If you’re passionate about: ✨ Building real-world AI applications ✨ Turning cutting-edge models into impactful products ✨ Moving fast in a high-growth, high-energy environment …we want to hear from you. At Nscale, you’ll work at the intersection of AI, product, and engineering, bringing ideas to life and delivering meaningful innovation at scale. 📍 Location: Singapore 👉 Apply here: https://lnkd.in/gHXcbeea Know someone who’d be a great fit? Tag them below 👇 or share this post! #Hiring #AIJobs #ProductEngineering #SingaporeTech #ArtificialIntelligence #StartupLife #Nscale Tom Matthews Eyal Lantzman Daniel Bathurst Juliana Tan Richard Hindle
-
Eyal Lantzman reposted thisEyal Lantzman reposted thisInference is on track to account for more than half of all AI compute by 2030. The infrastructure supporting it hasn't kept pace. Most AI services platforms give engineering teams the same choice: move fast and lose control, or keep control and lose time. Both paths slow teams down. And as inference volumes scale, the cost of that trade-off compounds. The answer isn't a better version of the same choice. It's a different design philosophy entirely. Nscale's approach: modular, sovereignty-aware building blocks that let teams move quickly without locking into predefined workflows. Inference, fine-tuning, and structured evaluation, built on owned infrastructure and optimized at the unit economics level. Governance built in. Performance and cost that compound together. Our team of experts Eyal Lantzman, Abigail MacKenzie-Armes, and Oscar Savolainen, PhD have written up the thinking behind it. Read the full article: https://lnkd.in/eAJAbAE7
-
Eyal Lantzman shared thisAmazing team, grand vision, massive challenge, Join Matt!Eyal Lantzman shared thisMillions of metrics. Massive GPU clusters. Zero blind spots. When your infrastructure powers the global AI economy, you can't afford to guess. Nscale is growing rapidly, and I’m hiring a Principal Observability Platform Engineer to lead how we monitor, trace, and optimize at an unprecedented scale. Tired of small-scale telemetry? Come build the systems that watch the supercomputers. #Nscale #Observability #PlatformEngineering #TechJobs #AI
-
Eyal Lantzman shared thisI'm happy to announce that Nscale now supports Kimi K2.5 as one of our flagship Inference models. Go ahead and give it a try! https://lnkd.in/eDtkQA4V
-
Eyal Lantzman shared thisWe have a few more specialised AI engineer roles in Singapore. Come joins us, we have incredible roadmap to build for NscaleEyal Lantzman shared this💥 We’re hiring a Senior AI Engineer at Nscale! 💥 If building the engine of superintelligence sounds like your kind of Wenesday… we should talk. We’re looking for someone who loves diving deep into AI systems, tweaking models until they behave (mostly 😅), and shaping the future of full‑stack AI infrastructure. You’ll get to work with an incredibly smart team, tackle problems that actually move the needle, and build AI capabilities at massive scale. If that sounds fun — because honestly, it is — come join us. 👉 Apply now or DM Eyal Lantzman if you want the inside scoop. https://lnkd.in/gkXyRCRK Let’s build the future of intelligence together. 🚀 Daniel Bathurst Juliana Tan Richard Hindle
-
Eyal Lantzman posted thisI'm excited to announce two new roles for Nscale specialist AI engineers. We're not looking for GenAI app builders, we're looking for those that build the services for the app builders! I'm not interested in recruiters or agencies. See link in comments.
-
Eyal Lantzman reposted thisEyal Lantzman reposted thisI’m scaling the security team at #Nscale. This is a fast-moving, AI-native environment - lean teams with real ownership. We’re looking for people who prefer building over documenting, are comfortable moving quickly and want to shape security in modern AI infrastructure. Hiring across all domains and levels in Seattle, Bay Area, NYC, London or remote. Check out our postings. Principal Engineer, Platform Security: https://lnkd.in/eRHcyT6d Director, Enterprise Security & Identity: https://lnkd.in/eZz-K8T5 Director, Cyber Defense Engineering: https://lnkd.in/eMgYwKeh Incident Response Lead: https://lnkd.in/eSYmh8DT Staff Security Engineer (Detection Platform): https://lnkd.in/ecz79tm8 Staff Security Engineer (Red Team): https://lnkd.in/erZMA8gK Staff Security Engineer (Threat Intel): https://lnkd.in/e3Mxqmcc Staff Security Engineer (Endpoint): https://lnkd.in/exUnaB9m Manager, Security Operations: https://lnkd.in/esaVmWTf Senior Staff Security Engineer (Identity): https://lnkd.in/e8Qm7hAb Senior Staff Security Engineer (Enterprise): https://lnkd.in/e-GQ3vSN
-
Eyal Lantzman shared thisSee below, unique opportunity to shape the product direction for AI tools and AI native platform in Nscale. If you have bold ideas, reach out to Hamish Jackson-Mee. Nscale is the place where your ideas can become reality at extremely fast pace, so don’t let your ideas marinate or be wasted, reach out and make a difference!Eyal Lantzman shared thisProduct people of LinkedIn, if you're interested in building AI tooling and the platform that supports them, please DM me. Super exciting space, great people, lots of opportunities for growth, etc etc!
-
Eyal Lantzman reacted on thisEyal Lantzman reacted on thisWord is Amazon “accidentally” spent $500,000,000 on Claude in just one month. It might be my new favourite case of poetic justice.
-
Eyal Lantzman liked thisEyal Lantzman liked thisWe're working with NVIDIA, VAST Data, and Nokia to explore what it actually takes to build AI-ready network infrastructure. The AI Grid is our report focusing on how telcos can lead the next generation of artificial intelligence networks. AI is moving into production fast. It’s exposing a new set of infrastructure requirements: locality, control, performance, and resilience, and Telcos are uniquely positioned to build, monetize, and lead it. The opportunity now is to turn those assets into something more. The AI Grid sets out how that shift is already happening, and what Telcos need to do to be part of it. Download the full report: https://lnkd.in/eYR3a4NB
-
Eyal Lantzman liked thisEyal Lantzman liked thisA Keycloak implementation of the Transaction Tokens Internet Draft is available for testing: https://lnkd.in/eh4xqj8m This draft introduces the concept of a Transaction Token (Txn-Token), which encapsulates the data related to the requesting user (human), requesting workload (non-human) and the transaction itself, and preserves it throughout the entire call chain. The Keycloak implementation features the standard Txn-Token flow, SPIFFE client authentication, and an experimental Google CEL Policy support (WIP). You can try it out now in local Kubernetes, using the demo environment provided. For those attending #OSW2026 in Leipzig - Pieter Kasselman and I will be presenting Keycloak Transaction Tokens today (Thursday, 29th of May) at 10:00am. Don't miss it!GitHub - CarrettiPro/keycloak-tts: Keycloak Transaction Token ServiceGitHub - CarrettiPro/keycloak-tts: Keycloak Transaction Token Service
-
Eyal Lantzman liked thisEyal Lantzman liked thisThe European Commission is preparing rules that would require sovereign cloud infrastructure for any EU government workload handling healthcare, finance or judicial data. The current presentation date is 3 June 2026, after several earlier delays. In practical terms, the rules would push AWS, Microsoft Azure and Google Cloud out of those workloads. The three of them currently hold about 70% of Europe's cloud market. The mechanism is the Cloud and AI Development Act (CADA), the headline bill in the Commission's Tech Sovereignty Package. CADA is expected to define what counts as a sovereign cloud and to set procurement standards that an American-incorporated provider cannot meet, regardless of which data centre your data lives in. The reason this matters is the CLOUD Act. Since 2018, US law enforcement can ask any American company to hand over data it controls, anywhere in the world. So a Frankfurt region run by a Seattle parent is, in the eyes of a US court, the same as a Virginia region. The address on the invoice doesn't change who the parent answers to. I work in colocation for Kolo, a 100% European-owned operator. We're not a cloud and that isn't what this post is about. What is worth noticing is that for the first time the EU is preparing to formally connect the legal entity behind your infrastructure to whether that infrastructure can be called sovereign at all. Private companies stay free to pick whoever they like. But government procurement rules tend to set the bar that the rest of the market gets compared to a few years later. If you buy or run infrastructure in Europe, the question this month stopped being "do we have an EU region?". It is "if a US warrant landed on our provider's parent on Monday, what would we have to hand over by Friday?". Brussels is about to answer that question for itself. Has the CLOUD Act, or the upcoming package, changed how anyone you know is picking providers?
-
Eyal Lantzman liked thisEyal Lantzman liked thisToday, our amazing People Enablement team organized a Women @ Nscale event, sponsored by our President of AI Nidhi Chappell , focused on shaping the future of women in technology 🚀 The session covered topics around Owning Your Voice in Tech, with insightful discussions around leadership, confidence, influence, and career growth in the industry. It was incredibly inspiring to hear from our C-level female leaders sharing their career journeys, how they scaled their careers over time, and the importance of building a strong and supportive community around ourselves and advocating for each other 💙 Thank you! Phoebee Gahan Lauren R Hurwitz Alice Takhtajan Nidhi Chappell And Gigi W. Amy Sheldrake for organizing and helping to build such an important community!
-
Eyal Lantzman liked thisEyal Lantzman liked thisTwo years ago today, Nscale was founded, and we couldn't have picked a better week to bring our legal team together in person for an offsite. Today, for the first time, our entire global legal team was in the same room. People who have worked side by side across time zones for months, finally meeting face-to-face. It's incredible to reflect on the days when I was the only member of the legal function, and just how far we have come in such a short period of time. Two years in, Nscale has gone from a scrappy group of believers to something genuinely extraordinary, and the legal function has scaled every step of the way. We are not "back office" – we are the function that enables the business to scale at pace, manages risk, and helps protect Nscale's license to operate. To those who helped make today's offsite such a success, as well as everyone who travelled and showed up today: thank you! So proud of this team and this company.
-
Eyal Lantzman liked thisEyal Lantzman liked thisTwo years ago today, we launched the Nscale platform. Since then: the largest Series B and Series C in European history. Data centers powering compute for our customers, and in the pipeline across the globe, from Norway to Portugal to the US and beyond. An AI campus in West Virginia, the first state-certified AI microgrid in the US with a power runway scalable to over 8GW. And partnerships with some of the most consequential companies in AI. This is the Fourth Industrial Revolution. And it demands infrastructure built from the ground up: energy, data centers, compute, and software, unified under one roof, delivered at massive scale. That's what Nscale does, and that's what makes us different. Big thanks to the Nscale team who have been part of this journey for the past two years (14 dog years). Year three starts now. LFG
-
Eyal Lantzman liked thisEyal Lantzman liked thisI'm thrilled to announce I've joined Nscale in an AI Infrastructure role — and stepping into the deep end of AI Data Centres has already been a humbling reminder of how much there is still to learn. Before I look ahead, I want to pause and recognise the people who shaped my journey that ultimately brought me to Nscale. Thank you to Manfred Felsberg, Mansour Karam, DJ Spry, Matt Free, David Watkins, Malek Akilie, Mohammed Mahmoud, Pavan Kurapati for your guidance, your trust, and your friendship. A special thanks to my former rockstar Juniper team Thorbjörn Zieger, Elisabeth Rodrigues, Valerio Martini, Mohamed Abouzeina and Cody Conklin. I wish you all the best my friends. No doubt we'll work together again soon. And to my new Nscale & NVIDIA colleagues — particularly David Power and Philip Hofstad, I couldn't be more energised about what we're building together. Let's get to work.
Experience & Education
-
Nscale
**** ** ** *********
-
******** ***** * ***
** **** ** **** ** ***** ************
-
****** *** ********
****** ******** *********** ********
-
*** **** ********** ** ******
** ******** ******* undefined
-
View Eyal’s full experience
See their title, tenure and more.
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Publications
-
Neural Networks Using C# Succinctly
Syncfusion
Patents
-
Method and system for context-based access control of network resources
Issued US12495046B2
-
Method and system for seamlessly registering entitlements within different types of access management systems
US20250184332A1
-
System and method for tracking and managing state of provisioned workspaces and workloads
US20250036479A1
-
Systems and methods for declarative composition of infrastructure components of a software product definition for automated end to end deployment
US20240256247A1
-
Systems and methods for rule-based machine learning model promotion
US20230281482A1
-
Systems and methods for runtime input and output content moderation for large language models
US20250005060A1
Languages
-
English
Full professional proficiency
-
Hebrew
Native or bilingual proficiency
-
Russian
Native or bilingual proficiency
Recommendations received
2 people have recommended Eyal
Join now to viewView Eyal’s full profile
-
See who you know in common
-
Get introduced
-
Contact Eyal directly
Other similar profiles
-
Fitz 🦀
Fitz 🦀
I'm Fitz, and I am passionate about how we can lift the human condition by improving how we build technology. Co-authored the AWS Well-Architected Framework, and led the global AWS Well-Architected team. I've patented methods for reviewing architectures, and created an AWS Service, from inception to GA, leading to improvements in customers architectures at scale.<br><br>I’ve worked directly with thousands of businesses to help them adopt and improve their architectures, and transform their businesses. A technologist, with first-hand experience of seeing how technologies can help businesses and industries innovate. Experienced in building information rich environments, data driven solutions, informatics, reliability and performance. Ability to mentor others and improve organizational structures, processes and standards. With the business acumen to create complete solutions, and communicate clearly and effectively to all stakeholders.
9K followersLondon
Explore more posts
-
MLOps community
46K followers
A lot of agentic pipelines are using frontier LLMs for tasks that don't need them. Entity recognition, intent classification, policy checking - these are extraction tasks, not reasoning tasks, and the cost and latency show it. Ash Lewis, Co-founder and CEO of Fastino Labs, is presenting at Coding Agents on March 3rd on how GLiNER2 - a 205M parameter encoder model - handles four core extraction tasks in a single forward pass, with deterministic outputs and no prompt engineering. The case he's making is straightforward: reserve the LLM for reasoning and generation, and stop paying frontier prices for structured extraction. Computer History Museum, Mountain View. March 3rd. Tickets are nearly gone - grab yours before it closes: https://lnkd.in/giDNa7fG
26
-
LlamaIndex
285K followers
Let's talk parsing tables. Two days ago we launched ParseBench, the first benchmark for measuring document OCR accuracy for AI agents, with 5 new metrics. In this deep dive, we break down TableRecordMatch (GTRM), our metric for evaluating complex tables with merged cells, hierarchical headers, and multi-page spans. Instead of comparing tables cell by cell, TableRecordMatch evaluates them the way downstream systems actually consume the data, as records keyed by column headers. Column reordering doesn't tank your score. Transposed headers and dropped columns do. Read our full blog on ParseBench, access our source code, and get access to the benchmark dataset on Hugging Face all here: https://lnkd.in/epksZ3Nk
56
3 Comments -
Redpanda Data
26K followers
🔥 ENGINEERING UPDATE: Redpanda AI SDK for Go is now #opensource As part of our #AgenticDataPlane (ADP), we're building managed and self-hosted AI agents. These agents need to work in production, not just in demos. They also need to be observable, resilient, flexible across providers, and easy to test without crossing your fingers before every deploy. 🤞 We checked existing #Go options and found useful pieces, but not the full package. So we built one. Today, we’re open-sourcing it. Read all about it and how to get started👇 https://lnkd.in/dDv8aXbT
30
-
Socialdevflow
35 followers
POML: Prompt Orchestration Markup Language Introduction As Large Language Models (LLMs) become increasingly central to modern applications, developers face growing challenges in prompt engineering. While LLMs can produce powerful results, the lack of structure and consistency in prompt design often leads to fragile, hard-to-maintain workflows. POML (Prompt Orchestration Markup Language) was created to solve these issues. It introduces a structured, modular, and versatile approach to building prompts—empowering developers to design reliable, scalable, and sophisticated LLM applications. Key Features of POML 1. Structured Prompt Design Instead of writing long, unorganized text prompts, POML allows developers to break down prompts into modular components. This ensures clarity, reusability, and easier debugging. 2. Data Integration Made Easy POML supports seamless integration of dynamic data sources—from JSON and databases to external APIs. This eliminates the need for messy string concatenation and improves maintainability. 3. Presentation Flexibility Different tasks or users may require different prompt formats. POML provides a way to switch presentation variations without rewriting the entire prompt logic. 4. Tooling and Ecosystem Support With extensions available for VSCode, PyPI, and npm, POML integrates smoothly into existing developer workflows. It also offers testing and validation tools to ensure prompt quality. Why POML Matters • Consistency – Reduce unpredictable LLM behavior by enforcing structured prompt patterns. • Collaboration – Teams can work on modular prompt files with version control, making changes safer and more transparent. • Scalability – Ideal for enterprise-level applications where prompts must be reused, monitored, and updated frequently. • Experimentation – Quickly test variations of the same prompt structure without breaking the entire pipeline. Use Cases 1. Customer Support Standardize multi-turn conversations with structured templates. 2. Content Generation Create flexible article outlines, summaries, or scripts that adapt to dynamic input data. 3. Enterprise Applications Integrate with existing data pipelines (CRM, analytics, finance) while keeping prompts consistent. 4. Research & Experimentation Quickly test how variations in constraints, roles, or examples affect model output. Conclusion POML represents a step forward in prompt engineering—offering the structure and flexibility developers need to scale LLM-powered applications. By combining modular design, data integration, and ecosystem support, POML makes it easier to build prompts that are reliable, reusable, and production-ready.
1
-
National Bureau of Economic Research
82K followers
Examining how managers should deploy AI in sequential team workflows finds that optimal AI deployment is stochastic, replaces early and late positions in the workflow, and results in lower wage inequality, from Xienan Cheng, Mustafa Dogan, and Pinar Yildirim https://lnkd.in/ewSrbPmH
53
-
Anyscale
60K followers
vLLM has reached 66k+ GitHub stars and millions of downloads in just over two years, and is widely adopted as an open source inference engine. 🎥 At Ray Summit, we sat down with Simon Mo, co-lead of the vLLM project, to discuss how vLLM is built, architectural choices , how Ray and vLLM work together, what’s next for the project and much more! ⏱️ Chapters & Timestamps 𝟬:𝟬𝟬 Overview of vLLM 𝟬𝟭:𝟬𝟭 PagedAttention & Early Architectural Decisions 𝟬𝟮:𝟭𝟭 Why vLLM Adoption Is Accelerating 𝟬𝟰:𝟮𝟴 How vLLM and Ray Work Together 𝟬𝟳:𝟬𝟬 The State of vLLM Today 𝟭𝟬:𝟬𝟵 Simon Mo’s Open Source Journey 𝟭𝟮:𝟮𝟴 Advice for AI Builders & Contributors
132
1 Comment -
Dataoorts | GPU Cloud
1K followers
Common GPU Bottlenecks During Fine-Tuning and How to Avoid Them in 2025 Fine-tuning LLMs and diffusion models is expanding fast, but many teams still lose 30-50% of training time because of avoidable GPU bottlenecks. These slowdowns increase cloud spend, delay iterations & reduce model throughput. Knowing these bottlenecks early helps teams train faster and scale more efficiently. 1. Underutilized GPUs GPUs often operate at only 40-60% utilization because the data pipeline is slow to keep up. This forces GPUs to wait between batches, creating wasted compute hours. Many teams misdiagnose this as poor GPU performance when the real issue is upstream in data loading. Fix: Use parallel data loaders, prefetching, NVMe storage and larger batch sizes to keep the GPU saturated. 2. VRAM Overflow Large models like LLaMA 3, Qwen & Mistral often exceed VRAM limits, which pushes computation onto the CPU. If this happens, training can slow down by as much as 40× because CPUs cannot handle GPU-level tensor operations. This not only hurts performance but also increases training instability. Fix: Use gradient checkpointing, QLoRA (4-bit or 8-bit) and higher-VRAM GPUs like A100 80GB, H100 or L40S. 3. CPU Bottlenecks Even the fastest GPUs perform poorly when paired with underpowered CPUs that cannot keep up with preprocessing, tokenization or batch preparation. These CPU delays create idle time, making GPU performance appear inconsistent. As models grow, poor CPU throughput becomes an even bigger constraint. Fix: Choose GPU servers with balanced CPU ratios, enable multi-threading and move heavy preprocessing to async workers. 4. Disk I/O Limitations When datasets sit on slow disks, the GPU repeatedly pauses to wait for data reads. This issue becomes even more noticeable with large datasets, high-resolution image pipelines where read operations dominate. The result is lower GPU utilisation & unnecessary delays. Fix: Use NVMe SSDs, keep datasets local and avoid slow network-attached storage during training. 5. Wrong Precision Modes Many teams train in FP32 even though it offers no extra accuracy for most fine-tuning tasks. This wastes compute, restricts batch sizes and increases VRAM pressure, slowing down training overall. Lower precision modes deliver equal performance with dramatically faster throughput. Fix: Use BF16/FP16 for most workloads and apply INT8/4-bit quantized fine-tuning when mem is tight. Want Faster Cost-Efficient Fine-Tuning?😄 If you want high-performance GPUs with predictable pricing, reservation options & architecture designed to remove these bottlenecks, Dataoorts is optimised exactly for fine-tuning. You get stable A100, H100 and L40S compute, fast provisioning and cost-saving reservation plans built for long training runs. Scale your fine-tuning with faster infrastructure, transparent pricing and smarter GPU planning. #AI #MachineLearning #LLM #GPUCloud #DeepLearning #FineTuning #A100 #H100 #Dataoorts #NVIDIA #GPURental #on-demand_cloud
4
-
MLflow
78K followers
𝗜𝗖𝗬𝗠𝗜 🚀 New LLM-as-judges Integration: TruLens Trace Evaluation in MLflow We are excited to announce the TruLens integration for MLflow! This expands our third-party scorer framework, which already supports DeepEval, RAGAS, and Phoenix—an ecosystem with 32M+ monthly PyPI downloads. In agentic workflows, a correct final answer can mask a flawed plan, redundant tool calls, or broken reasoning. To build reliable and trustworthy agentic systems, you must evaluate the execution traces. Developed by the TruLens team at Snowflake, this integration adds 10 scorers to MLflow that analyze the full span tree: 🔹 𝗚𝗼𝗮𝗹-𝗣𝗹𝗮𝗻 𝗔𝗹𝗶𝗴𝗻𝗺𝗲𝗻𝘁: Evaluates strategy and tool selection. 🔹 𝗣𝗹𝗮𝗻-𝗔𝗰𝘁𝗶𝗼𝗻 𝗔𝗹𝗶𝗴𝗻𝗺𝗲𝗻𝘁: Checks for plan adherence and valid tool calling. 🔹 𝗛𝗼𝗹𝗶𝘀𝘁𝗶𝗰 𝗔𝗹𝗶𝗴𝗻𝗺𝗲𝗻𝘁: Grades logical consistency and execution efficiency. You can now mix agent trace scorers, RAG scorers, and content quality scorers in a single 𝚖𝚕𝚏𝚕𝚘𝚠.𝚐𝚎𝚗𝚊𝚒.𝚎𝚟𝚊𝚕𝚞𝚊𝚝𝚎() call. 🔗 Full technical details: https://lnkd.in/eACHyxaK #MLflow #TruLens #LLM #GenAI #LLMOps #AIObserveability
65
-
Paul Negedu
Quizlink • 2K followers
Just open-sourced LangGraphGo 🦜🔗 The AI infrastructure powering Quizlink is now available on GitHub. LangGraphGo lets you build resilient AI agents as graphs with: → Real-time streaming → Graph visualization (Mermaid, DOT, ASCII) → State checkpointing → Langfuse observability → Works with OpenAI, Anthropic, Google AI If you're building AI in Golang, hope this helps. 🔗 https://lnkd.in/gkJbhUMN #opensource #golang #AI #langgraph #langgraph
15
-
Business Analytics Institute
1K followers
By caching past keys and values, inference shifts from quadratic to linear complexity, cutting latency by ~70% and delivering 3–4× faster responses for long contexts. But speed comes at a price: massive VRAM usage, making memory the real bottleneck behind context window pricing. The takeaway? LLM inference today is a memory vs cost trade-off, and KV cache design is shaping everything from API economics to next-gen attention architectures. If you’re building or deploying LLMs at scale, this is one diagram worth internalizing. #LLM #GenerativeAI #AIInfrastructure #KVCache #MachineLearning #Inference #AIEngineering
2
-
Marcin Gierlak
Tech leader with a builder’s… • 4K followers
Alibaba has introduced AgentScope, an open-source Python framework for building multi-agent applications. The framework provides visual tools for creating agents with support for MCP integration, memory, retrieval-augmented generation, and reasoning. It works with any LLM and includes real-time steering capabilities.
2
1 Comment -
Hayride
27 followers
We just published a post walking through how to build a lightweight CLI AI agent in Go that runs on Hayride — an open-source secure AI runtime for LLMs, sandboxed code execution, and orchestrating agentic workflows. Our example uses: ⚙️ Go + TinyGo to compile the CLI to WebAssembly 🏗️ Hayride’s AI morphs (prebuilt Wasm components for agents, runners, tools, and models) 🏖️ WebAssembly Interface Types (WIT) and the Component Model for strict sandboxing and composability ⚒️ WAC for composing Wasm components into a deployable agent The post covers: ⚙️ Setting up the project and WIT dependencies ⌨️ Implementing a CLI in Go that streams prompts/responses to an LLM 🧩 Composition with Hayride’s prebuilt morphs 🚀 Deploying the resulting morph to Hayride If you’re curious about mixing Go, LLMs, and WebAssembly in a secure, composable AI environment — or just want to see an end-to-end example of building a CLI AI Agent — check it out: 📖 Read the post: https://lnkd.in/g-Wbs2Cb We would love feedback from other developers experimenting with WebAssembly + AI.
2
-
Azure Cosmos DB
10K followers
Running LLMs locally opens the door to new dev workflows — private, secure, and offline-capable. This video shows how to combine Ollama, LangChain, and the Azure Cosmos DB emulator to build a Retrieval-Augmented Generation app on your own machine. 📺 Full walkthrough: https://msft.it/6044s5D46 #AzureCosmosDB #Azure #AI
-
Corteza Project
1K followers
Discover a thoughtful perspective on the Apache 2.0 license and its implications for open-source software at Planet Crust. This nuanced exploration sheds light on the intricacies of open-source licensing, offering valuable insights for developers and organizations alike. Learn more about how these licenses can impact your projects and drive innovation. #OpenSource #SoftwareLicensing
-
WebCraftingCode
151 followers
In the latest episode of the AXRP podcast, Lee Sharkey introduces Attribution-based Parameter Decomposition (APD), a groundbreaking approach to mechanistic interpretability in deep learning. This innovative framework focuses on decomposing model weight parameters rather than activations, offering a new lens through which we can understand AI systems. APD operates on a triad of principles: faithfulness, minimality, and simplicity. By ensuring that the decomposed components accurately reflect the original model weights, only the most influential parameters are retained, and the complexity of these components is minimized. This approach not only enhances interpretability but also maintains high fidelity in performance. The implications of APD are significant. It moves beyond traditional activation-based methods, allowing for architecture-agnostic applications across various models, including transformers and CNNs. As AI continues to evolve, understanding the internal workings of these systems becomes crucial for transparency and trust. As the research progresses, APD holds the potential to transform how we perceive and interact with AI, paving the way for more robust and interpretable models. Explore more about APD and its implications in the full article here: https://lnkd.in/eQKcRJxn
-
Biocollaborator
102 followers
🚀 948x Faster? Google Just Solved the Biggest Headache in Generative Retrieval. If you’ve ever worked with LLMs for recommendation systems, you know the struggle: Generative Retrieval (GR) is brilliant for accuracy, but it's traditionally a total snail on hardware. The bottleneck? Tries (Prefix Trees). While we’ve used them for years to mask invalid tokens, they are a nightmare for GPUs and TPUs. They force irregular memory access and mess with static computation graphs (XLA). Basically, you have a Ferrari engine (the LLM) being fueled by a pipette. Enter STATIC. Google AI just dropped a new framework called Sparse MaTrix Framework for In-accelerator Constrained Decoding. Instead of clunky trees, it uses sparse matrix operations to handle constraints. The TL;DR on why this matters: ◾Insane Speed: We’re talking up to 948x faster decoding. 🏎️ ◾Hardware Native: It keeps the entire process on the accelerator (GPU/TPU), cutting out the CPU-to-Accelerator "lag." ◾Scale: You can now run Generative Retrieval across millions of items without your latency budget exploding. ◾The Bottom Line: This moves LLM-based search from a "cool research project" to a "high-performance reality" for massive industrial recommendation engines. Check out the full breakdown here: https://lnkd.in/eN3N_YrU #GoogleAI #MachineLearning #GenerativeAI #LLM #SearchTech #Innovation
1
-
Valiqor
3 followers
As LLMs and agents move into production, a new class of failures is showing up: Not hallucinations - execution drift, unsafe tool calls, and silent workflow breaks. Most teams still debug these with logs and prompt tweaks. We’re experimenting with a different approach: Treat AI systems like distributed systems - trace decisions, classify failure modes, and firewall risky actions at runtime. We are curious: Where do your agents fail most often - tools, data, reasoning, or orchestration? #LLMOps #AIAgents
2
-
Devout Growth
4K followers
Is TOON Going to Be the Next Feed Format for LLMs, or Even Replace JSON? (It’s for flat JSON data, not nested ones.) As LLMs evolve, efficiency in data representation becomes crucial. That’s where TOON (Token-Oriented Object Notation) steps in, a compact and LLM-friendly format designed to make data handling faster and cleaner. Why TOON stands out: -Uses less data than formatted JSON (especially in large arrays). -Built-in guardrails with explicit lengths and fields. -Cleaner syntax, no extra braces or quotes. -Indentation-based structure like YAML. -Handles tabular arrays efficiently, keys declared once, data flows as rows. Note: TOON works best with flat JSON data. If your data is nested, converting it directly into TOON can make it heavier and less efficient. Always flatten your JSON before using TOON. Could TOON become the next data format standard for AI models? #TOON #TOONFormat #TechInnovation #JSONAlternatives #LLMOptimization #CodingStandards #MinimalSyntax #AI #DevoutGrowth
23
1 Comment -
NeuBird AI
19K followers
The hardest part of building enterprise AI agents isn’t chaining LLMs together. It’s teaching them how to make smart tradeoffs under real-world constraints. In his recent Techstrong Group.ai piece, our co-founder, Vinod Jayaraman, explains why the best agents won’t be judged by how much output they generate, but by how well they balance speed, quality, and cost. Effective agents adapt their reasoning to the urgency of the problem and ground decisions in real-world systems and rules. Vinod also introduces the idea of domain-specific chain-of-thought as the new runbook for enterprise AI, helping agents reason more like seasoned engineers and less like abstract models. Link in comments! #EnterpriseAI #AgenticAI #AIOps
19
1 Comment
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore More