Sign in to view Wei’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Sign in to view Wei’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Kenmore, Washington, United States
Sign in to view Wei’s full profile
Wei can introduce you to 10+ people at Microsoft
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
10K followers
500+ connections
Sign in to view Wei’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Wei
Wei can introduce you to 10+ people at Microsoft
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Wei
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Sign in to view Wei’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Activity
10K followers
-
Wei Wang reposted thisExcited on WWDC 2024? Get Siri + AI today at getshortcuts.ai! 🚀 I'm super excited to launch beta for my side project, which turn your Siri into an LLM powered voice assistant. ✨ Features: ⌚️ Works on Apple Watch, iPhone, iPad, Mac (even on lock screen) 🤯 Chat (talk/type) with GPT-4o or other models and platforms 📱 Complete task on device e.g. take a note, make a call, ... 🙅🏻♂️ No new device to buy / No new App to install / Nothing new to learn ⚡️ Only takes 1 minute to setup 🗣️ Can speak in your language such as Chinese 🤖 Customize name, greeting, system prompt to fit your style and flow 🔎 Support search the internet to get information fast 📄 Summarize web page and ask anything right in your iPhone 👀 Support limited vision to understand image or photo, etc Take AI shortcut from question to answer, from problem to result, from start to success! You can get start for free today at https://getshortcuts.ai. If you found this interesting, feel free to like 👍, comment 💬 or repost ♻️. Follow me on Twiiter/X: https://x.com/DavidWShadow for the latest update. #ai #wwdc #apple #siri #ios18 #openai
-
Wei Wang shared thisExcited on WWDC 2024? Get Siri + AI today at getshortcuts.ai! 🚀 I'm super excited to launch beta for my side project, which turn your Siri into an LLM powered voice assistant. ✨ Features: ⌚️ Works on Apple Watch, iPhone, iPad, Mac (even on lock screen) 🤯 Chat (talk/type) with GPT-4o or other models and platforms 📱 Complete task on device e.g. take a note, make a call, ... 🙅🏻♂️ No new device to buy / No new App to install / Nothing new to learn ⚡️ Only takes 1 minute to setup 🗣️ Can speak in your language such as Chinese 🤖 Customize name, greeting, system prompt to fit your style and flow 🔎 Support search the internet to get information fast 📄 Summarize web page and ask anything right in your iPhone 👀 Support limited vision to understand image or photo, etc Take AI shortcut from question to answer, from problem to result, from start to success! You can get start for free today at https://getshortcuts.ai. If you found this interesting, feel free to like 👍, comment 💬 or repost ♻️. Follow me on Twiiter/X: https://x.com/DavidWShadow for the latest update. #ai #wwdc #apple #siri #ios18 #openai
-
Wei Wang shared thisOur team is looking for Principal Software Engineer - Productivity Tools. Come join me at Roblox! We're in explosive growth mode and looking for top-notch talent. Please feel free to reach out: https://lnkd.in/gqtfVS3
-
Wei Wang shared thisThe robot looks kind of cute, running with Asus depth camera, Intel Baytrail platform, Ubuntu Operating System and open source ROS.
-
Wei Wang shared thisJust found an old lab testing video of my self navigation and self balance robot delivery system prototype project back in 2014. This is a project for attending Intel Cup Embedded System Design Contest 2014 during my undergrad and won a second prize. This was achieved with my other two awesome teammates in 3 month. However I wasn't think big enough to transform it into a potential million dollar business. In the past couple years, delivery robot systems have being well commercialized by Alibaba and Meituan in China and Postmates in Europe. They are running everywhere in Campus, Hotel, Airport and even on open streets in some cities to help delivery things without human contact, which is important during the COVID-19. The experience is precise to me and would keep remind myself to think big and keep innovating.
-
Wei Wang posted this我已加入 #职场人时间捐赠计划 ,为大学生提供职业指导。一次善举,也���就能让年轻人少走弯路。点击链接,和我一起成为领英职场导师,也欢迎需要帮助的同学们来和我聊聊: https://lnkd.in/gxtqfg8
-
Wei Wang shared thisWei Wang shared thisWill tech make difference? Yes! Is tech affordable? Yes! Can we make it happening together? Yes. 28k users backed Wyze Lidar robot vacuum. 104k users backed Wyze Watch. Wyze for 2021? Yes! https://lnkd.in/ghH4UEX #iot #smartliving #smarthome #alexa #wyze #wyzecam #internetofthings #tech #innovation
-
Wei Wang shared thisWei Wang shared thisWhat’s the piece of clothes you will choose for winter? #Wool must have its name! As a fabric that accompanies humankind from the Stone Age, wool is loved by humans for its warm and durable texture. Recently, the wool industry has been pushed to the forefront of public opinion: not because of the product, but for some cruel facts behind it. #fashion #ethical #cruelty #crueltyfree #animalrights https://lnkd.in/gdSpPdd
-
Wei Wang shared thisNext.js is the best framework for web apps I have used. Looking forward to see and learn great ideas from others. #javascript #webdevelopment #developers
-
Wei Wang reacted on thisWei Wang reacted on thisI was thinking Claude Code was only for engineers, since it has "code" in the name. After using it for a while, I think it's actually one of the most useful tools for non-engineers right now. I switched from Claude chat to Claude Code because I was building my own agent, and going back and forth between chat windows was getting old. Claude Code works directly on my computer. It opens files, runs commands, makes changes. Setup was easier than I expected. You can ask Claude chat to walk you through it, takes about 10 minutes. The first time I used Claude Code to build an agent, I started by telling Claude who I am. I pasted this into the project's CLAUDE.md: "I'm a marketer with no engineering background. When making changes, explain the why, not just the what. When there are real tradeoffs, ask me to decide instead of choosing silently. Use proper engineering terminology, but briefly define terms the first time they come up." After that, Claude stopped quietly making decisions for me and started pausing to ask and explain. Less like outsourcing, more like working alongside someone who's also teaching me along the way. If you're using Claude Code too, what's a prompt you keep coming back to? #AI #BuildingInPublic #ClaudeCode
-
Wei Wang reacted on thisWei Wang reacted on thisIf anyone could make AI work seamlessly on your iPhone, it'd be Apple, right? Last year, Apple introduced Apple Intelligence into the Shortcuts app. That got my attention. A friend of mine is a heavy Shortcuts user. She's built automations for almost everything in her daily routine. It genuinely saves her time every day. But every single one? She built it herself. Step by step. So I asked her how Apple's AI integration has been. She knew about it, she'd tried it. Her response? "Not very usable." Still building everything by hand. I tried it too. Found the feature, opened it… and realized that even when you know it exists, getting from "this is interesting" to "this is part of my daily routine" still feels like a stretch. So Apple built it. A power user tried it and went back to doing things manually. And someone actively looking for it still couldn't get it to feel natural. The capability is there. But the gap between "available" and "actually useful in my life" is still very real. I keep coming back to this idea from my last post. AI-first isn't about capability. It's about whether AI shows up at the right moment, without you having to think about it. Has anyone here actually tried using Apple Intelligence inside Shortcuts? How did it go? #AI #BuildingInPublic #AppleShortcuts #AppleIntelligence #ThinkingInPublic
-
Wei Wang liked thisWei Wang liked thisLately I've been in a bit of a learning rabbit hole. Going deeper into AI, experimenting, and trying to understand how people actually interact with it. I've also been reaching out to people across very different worlds (nonprofit, tech, education, etc.) and roles (marketing, sales, engineering...), just having conversations about how they're thinking about AI. Those conversations have honestly been more valuable than any course. At some point I realized I should stop keeping all of this in my head. So here's something I keep noticing: Most people I talk to in tech use AI every day at work. Some even build their own agents. But outside of work? Surprisingly quiet. Beyond opening a chat to ask a question, how many of us actually have AI built into our daily routines? Not as something we go to, but something that just works in the background of our lives? At work, AI plugs into existing workflows. There's structure, clear motivation, and a reason to use it. In life, those moments are messier. Small, scattered, personal. There's no obvious workflow to slot into. I'm starting to think "AI-first" isn't really about capability. It's about context. About whether AI shows up naturally in the moment, not just in the job. Curious: is AI actually part of your daily routine outside of work (beyond chatting, polishing emails, or generating fun images)? Or mostly just a work thing? #AI #BuildingInPublic #ThinkingInPublic
-
Wei Wang liked thisWei Wang liked thisThe biggest platform in enterprise IT is still being run by armies of contractors. That ends now. 85% of the Fortune 500 run on ServiceNow. Most still operate it the same way they did a decade ago. Manual configuration. Constant maintenance. Teams of consultants billing by the hour. Echelon is an AI workforce that operates your ServiceNow platform. Implementations. Migrations. Catalog builds. Flow automation. Integrations. Test coverage. Upgrade readiness. Daily platform operations. The full backlog, not just the easy tickets. Starting today, Echelon is available for any ServiceNow team to sign up. Over the past year, our agents have enabled an entirely new operating model for ServiceNow customers. - A Fortune 5 org compressed a 6-month migration into 10 weeks. - HubSpot unlocked 4,000+ hours of developer capacity. - Fox cut over $500K in annual MSP costs. - A major CPG brand modernized 200+ service catalogs in days. Every company we talk to has the same story - more licensed modules than they can implement, more backlog than they can staff for, more ambition than their operating model can support. Echelon closes that gap. Head to Echelon’s website and hit "Try Echelon." Connect your instance. Start executing in 10 minutes. Link in the comment below
-
Wei Wang liked thisWei Wang liked thisOne of my patents was granted recently, based on work I started in 2022 with the Windows research team. The patent plaque arrived recently, and it prompted some reflection on how technical patterns persist. In 2022, when monolithic structures and magic prompting were the norm, our thesis was simple: do not force a single model path to do everything. Decompose the system. Coordinate through a durable layer. In 2026, that thesis is now playing out clearly. A lot of the current conversation – LLM OS, agent runtimes, harness engineering – is circling the same underlying idea: model capability matters, but product capability is increasingly shaped by the system around it. That is where memory, tool use, planning, permissions, and execution get turned into something durable. What is especially interesting is how closely this mirrors the evolution of operating systems. Early OSs were relatively thin. Over time, capabilities that started outside the core, like scheduling, memory management, and security, proved essential enough to be pulled inward and become part of the runtime. We are seeing the same absorption pattern in AI. What begins as wrappers or orchestration layers does not stay there for long. The useful abstractions get absorbed. That is why the recent Claude Code discourse stood out. The takeaway was not “secret model magic.” It was how much of the capability lived in the system layer – a substantial orchestration layer coordinating tools, memory, permissions, API calls, and execution. The industry keeps swinging between extremes – the model is the product, or the harness is the product. The reality is in the middle. The model is the intelligent engine, but the system around it is what makes that intelligence durable, usable, and adaptable. The real opportunity is in how well the two evolve together. It is rewarding to see a patent based on work from 2022 feel even more relevant in 2026. We are not just building better models. We are building the operating layer that turns model capability into real systems.
-
Wei Wang liked thisWei Wang liked this当下的 OpenClaw🦞 使用体验,其实还挺像当年 ChatGPT 4.x 刚出来时的阶段。能用,也确实能解决不少问题,但总感觉离人类自己上手的效果还差那么一点点。ChatGPT 4.x 很多时候需要通过各种 Prompt 调优,极力去压榨模型的智力。 OpenClaw 的具体表现就是,对 token 的消耗特别大。原因也很简单,它需要在一个模糊且复杂的问题集上找到算法路径。整个过程是一种探索式计算,需要不断试探、回溯和修正,对计算量和上下文都会有很大的消耗。 在当下这个阶段,想提升 OpenClaw 的“智商”,比较有效的办法,就是让它学习人类已经 SOP 化的一些操作。把人类已经验证过的路径直接变成能力模块,让 Agent 少走弯路。 例如使用浏览器,可以用 agent-browser 这一类组件。它的原理是把浏览器协议能力暴露成可编程接口,让模型可以直接读取 DOM、操作页面元素、执行脚本,用结构化的方式去控制浏览器,从而绕开很多低效的探索。 再比如对操作系统的使用,可以用 Hammerspoon。它通过 Lua 脚本桥接 macOS 的系统 API,让自动化脚本可以直接控制窗口、快捷键、菜单栏和应用状态。很多原本需要视觉识别、反复尝试的动作,会变成一次确定性的系统调用。 对于不懂技术底层的人来说,安装 find-skills 会很大程度提升提升 OpenClaw🦞 的水平,因为它具备寻找人类 SOP 的技能。 那 OpenClaw 的下一个“ChatGPT 5.x 时刻”什么时候会到来?���的判断是不会太远。 当前大量的 OpenClaw 使用数据,在 computers/tools/browsers use 等场景里已经积累了非常多的数据集。大模型会根据真实用户的使用路径,加速自己的 RL 训练。 DeepSeek 已经证明了一件事情,推理能力是可以通过训练被内化到模型里的。接下来会发生的事情,是工具使用能力也会被逐渐内化。未来的模型会逐渐形成自己的工具世界模型,多轮工具调用、最佳调用路径、失败恢复策略等等,都会内化为模型能力。 到了那个阶段,OpenClaw 的体验很可能会出现一次明显跃迁。 今天很多人还在用 Claude Code 这样的工具,通过 Prompt、脚本和各种技巧去驱动 Agent 工作。整个过程有点像在 ChatGPT 4.x 阶段做工程,每一步都很依赖经验。 在当下阶段,我也更愿意采用这种务实的使用方式:Claude Code + 打造“最锋利的剑”。 所谓最锋利的剑,其实就是把工具使用的最佳实践不断聚合和沉淀下来。把浏览器操作、系统自动化、代码生成、文件处理这些能力逐渐模块化,变成稳定可复用的能力层,让 Agentic 工作真正跑起来。
-
Wei Wang liked thisWei Wang liked thisExcited to share that I joined Amazon AGI Labs a few months ago to work on RL for autonomous agents! Recently, I’ve been particularly interested in the computer use problem and how we can fundamentally simplify the way people interact with computers. On one of my family trips, I saw how much trouble my mom was having with seemingly simple phone and computer tasks. It became clear to me that as we advance these AI systems, we also have a responsibility in bringing along previous generations for the journey and I’m certain digital agents will play a crucial role. This space is still early but our lab strongly believes that large scale RL will be a key enabler in reliability and bridging the gap between current computer use models and human level performance. Last month, we released a significant upgrade to Nova Act by training a frontier level browser agent via RLVR across a large set of diverse environments and tasks (https://lnkd.in/gZ9sBaZa) and we’re aiming to dramatically improve and expand our model’s capabilities in the coming year. If pushing the frontier of autonomous agents in the digital and physical worlds sounds exciting to you, join us! We’re a relatively small lab and are actively hiring across the board: https://lnkd.in/gghNvMr6
-
Wei Wang liked thisWei Wang liked thisThese 13 Bay Area companies are actively hiring and sponsor visas for international talent: Sybill - AI-powered sales intelligence platform Hiring: Head of Sales, Staff AI Research Engineer Exa - AI search and retrieval platform Hiring: AI researchers, ML engineers, GTM and operations roles Mercor - AI talent marketplace Hiring: Engineering and operations roles Kikoff - Credit building made simple Hiring: Roles across all departments Mem0 - Memory layer for AI apps and agents Hiring: Technical Content Manager, product and engineering roles Parabola - No-code workflow automation Hiring: Enterprise Account Executive, senior software engineers Physical Intelligence - Foundation models for embodied AI Hiring: AI and engineering roles Sesame - AI-powered voice agents Hiring: Product, hardware, and software engineering roles Tavus - Teaching machines how to be human Hiring: Head of Marketing, engineering, product, and design roles San Francisco Compute Company - Infrastructure for AI compute Hiring: Various engineering roles RISA - AI OS for oncology workflows Hiring: Head of Sales, Head of Business Development Netlify - Web development platform Hiring: Staff GTM Engineer All of these companies have a track record of hiring international talent and sponsoring work visas - adding direct links to their career pages in the comments. If you’re a founder who’s open to hiring immigrants and sponsoring visas, drop a comment below. p.s. Alma is covering the H-1B lottery registration fee for 100 applicants this year. If you’re planning to apply, you can sign up at the link below and use the code H1BGIVEAWAY by January 31, 2026.
Experience & Education
-
Microsoft
****** ** ********
-
******
*** *** *** **************
-
******
****** ******** ********
-
*** ***** ******* **********
****** ** ******* ****** ******** ******* GPA 3.64/4.0
-
-
****
******** ** *********** ****** ************* *********** *** ********
-
View Wei’s full experience
See their title, tenure and more.
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View Wei’s full profile
-
See who you know in common
-
Get introduced
-
Contact Wei directly
Other similar profiles
Explore more posts
-
Adam Keene
Etsy • 3K followers
In ML and infra recruiting, I see a constant divide: Believing a system works vs. proving it works. To measure verification, look for four signals: Predicting Failure: Can they describe the crash before they write the code? Aggressive Questioning: Do they interrogate the constraints, or just accept them? Pivot: Do they find edge cases without a nudge? Healthy Scepticism: Do they treat AI-generated code as untrusted by default? The bottom line is to avoid hiring for the "happy path." Instead, hire for the person who knows exactly where the path ends.
6
-
Akhil Reddy Danda
HHA Hospital Medicine • 14K followers
What is Function calling & MCP for LLMs? (explained with visuals and code) Before MCPs became popular, AI workflows relied on traditional Function Calling for tool access. Now, MCP is standardizing it for Agents/LLMs. The visual below explains how Function Calling and MCP work under the hood. Today, let's learn: - Function calling by building custom tools for Agents. - How MCPs help by building a local MCP client with mcp-use and using tools from Browserbase MCP server. In Function Calling: - The LLM receives a prompt. - The LLM decides the tool. - The programmer implements a procedure to accept a tool call request from the LLM and prepare a function call. The tool call request is found in the LLM's response when you prompt it. - A backend service executes the tool. This Function Calling takes place within our stack: - We host the tool. - We implement a logic to determine the tool to invoke and its parameters. - We execute it. So Function Calling requires us to wire everything manually. MCP simplifies this! Instead of hard-wiring tools, MCP: - Standardizes defining, hosting, and exposing tools. - Makes it easy to discover tools, understand schemas, and use them. - Demands approval before invoking them. - Detaches implementation from consumption. For instance, whenever you integrate an MCP server, you never write a line of Python code to integrate the tools. Instead, you just integrate the MCP server and everything beyond this follows a standard protocol handled by the MCP client and the LLM: - They identify the MCP tool. - They prepare the input argument. - They invoke the tool. - They use the tool’s output to generate a response. Everything happens through a standard (but abstracted) protocol. So here’s the key point: MCP and Function Calling are not in conflict. They’re two sides of the same workflow. - Function Calling helps an LLM decide what it wants to do. - MCP ensures that tools are reliably available, discoverable, and executable, without you needing to custom-integrate everything. For example, an agent might say, “I need to search the web,” using function calling. That request can be routed through MCP to select from available web search tools, invoke the correct one, and return the result. Check the workflow in the diagram below. In this setup, to build a local MCP client, I used mcp-use because it lets us connect any LLM to MCP servers & build private MCP clients, unlike Claude/Cursor. - Compatible with Ollama & LangChain - Stream Agent output async - Built-in debugging mode, etc Find the mcp-use GitHub repo in the comments!
11
1 Comment -
Anupam D.
Amazon • 3K followers
Fine-tuning AI shouldn't require thousands of labeled examples. Reinforcement Fine-Tuning (RFT) for Amazon Nova changes that. Just give it prompts + a reward function. The model learns through feedback — not imitation. ✅ No massive datasets ✅ Works for code, customer service, financial reasoning ✅ Runs on Amazon Bedrock, SageMaker, or Nova Forge Smarter customization. Less grunt work. I wrote a deep-dive on this — check out my blog 👇 https://lnkd.in/ggNNezuC Authors:me, along with Chakravarthy Nagarajan Bharathan Balaji Vignesh Radhakrishnan #AWS #GenerativeAI #AmazonNova #MachineLearning #LLM
45
1 Comment -
Hank Li
Google • 1K followers
Manus (acquired by Meta) has learned from Neural Turing Machine(https://lnkd.in/g7qTchNd) to solve its agent long context issue, a few months or even one year before Claude's skills or latest ACE concept. So I read the paper to check whether Neural Turing Machine–style differentiable memory could solve long-context problems for AI agents, especially when the LLM is a blackbox without weight access. The key conclusion is that while NTMs introduce a powerful idea: external, differentiable working memory trained end-to-end, they don’t directly apply to modern blackbox LLM agents because you can’t backprop through the model or persist its runtime memory. Instead, practical systems approximate the concept using external retrieval-based memory: embeddings are used only to index and retrieve relevant information, and the retrieved content (as text or structured text) is inserted into the prompt. So without access to LLM weights, you can’t use LoRA or true differentiable memory, but you can build adaptive memory layers around the model using retrieval, compression, and policy optimization. Manus did not publish facts how they use the concept, so no way to confirm.
3
-
Nishantha Ruwan
IWROBOTX Software Inc. • 2K followers
The paper identifies a critical performance limitation in multi-turn, agentic large language model (LLM) inference workflows: the key-value (KV) cache I/O from external storage becomes the dominant bottleneck as context lengths grow. In traditional disaggregated inference systems, prefill engines (which generate KV cache data) saturate their storage network interfaces while decode engines sit underutilized, leading to poor throughput and inefficient resource use. To address this, the authors introduce DualPath, a novel inference architecture that adds a second KV-cache loading route directly from storage to the decode engines. Once loaded, the KV cache can be efficiently transferred to prefill engines via high-speed RDMA over the compute network, reducing pressure on the storage network and avoiding congestion with latency-sensitive computation traffic. A global scheduler dynamically balances workload across the prefill and decode engines to optimize utilization and performance. When evaluated on three different models under production-like agentic inference workloads, DualPath significantly improves throughput: up to 1.87× for offline inference scenarios and roughly 1.96× for online serving without violating service-level objectives (SLOs). These results suggest that reorganizing data flows at the system level can meaningfully break storage bandwidth barriers in large-context LLM deployments. https://lnkd.in/gFc7N99w
-
Rob Kemp
NVIDIA • 40K followers
Many enterprise GPUs run a single model during inference — even when it uses only ~30% of memory. So how much capacity is being left on the table? In our latest benchmark with Nebius, we used NVIDIA Run:ai fractional GPU allocation and NVIDIA NIM to measure real-world impact on throughput, latency, and concurrency. What we found: ✅ 86% of full GPU concurrent user capacity using just 0.5 slice ✅ 3× more users with mixed workloads on shared GPUs ✅ Near-linear scaling down to 0.125 slices ✅ Zero latency cliffs during autoscaling Stop GPU fragmentation. Start maximizing throughput. 🔗 Read the full deep dive: https://bit.ly/4kGtErX
3
-
Ankit Kr
Devout Growth • 10K followers
TOON Helped Us Cut 500 Tokens Instantly! 🚀 Today We Tried TOON… and You Won’t Believe What Happened! While working on our projects, we decided to test TOON instead of traditional JSON — and guess what? It reduced token usage by nearly 500 tokens per call 🤯 … without affecting the response quality at all! Looks like TOON isn’t just another feed format — it’s the next big step toward efficient LLM communication ⚡ #AI #LLM #TOON #JSON #Developers #Innovation
11
1 Comment -
Sameer Bhardwaj
Layrs • 52K followers
A candidate interviewing for a Senior Engineer @ Meta was asked to design a Distributed Stream Processing System like Kafka. Another candidate at Google's L5 loop got hit with the same question. If this question shows up in your system design interview, you are not being tested on config flags. You are being tested on whether you understand when to use it and what tradeoffs you are making. Btw, If you are preparing for DSA or system design, try our mock interview tool on Layrs for free:http://layrs.me/interviews Here is how I would structure a clear answer. (Note: I am not going into nuances, this is a very high-level explanation) 1. What problem are we solving At a high level, Kafka is my go-to when I need: - A central backbone for events or logs coming from many services - High throughput writes with ordering per key - Consumers that can read at their own pace and be added or removed independently - The ability to replay history using offsets Typical use cases I call out: - Asynchronous workflows - Central logging and metrics - Ad click or analytics streams - Pub sub for notifications or chat That already tells the interviewer I know -why- Kafka is in the picture. 2. Requirements Functional - Accept events from many producers in different services and regions - Preserve order for events that share the same key - Allow many independent consumers +some doing real time processing +some doing batch jobs or analytics - Support replay from an offset for backfills and reprocessing Non functional - Handle very high write rates - Scale horizontally by adding machines - Survive broker failures without losing committed data - Let us tune retention for cost vs replay needs Rest of the breakdown: https://lnkd.in/gmg7u-ku
384
12 Comments -
Yuzo Ishida
279 followers
"Immutable Database" (Insert-Only 7NF) transforms the "Reference-over-Copy" strategy from a memory-saving trick into a core architectural guarantee. ~ The definitive model for 2026-era silicon: https://lnkd.in/grDmENfK 1. Safety of the "Immutable Object" In traditional databases, objects are volatile; a reference could point to data that has been modified (the "Dirty Read" problem). The 7NF Guarantee: Since the database is Insert-Only, any mapping object or attribute retrieved is an Immutable Fact. Concurrency without Locks: Because facts never change, multiple threads can access the same mapping objects simultaneously without mutexes or row-level locking. This allows the silicon to run at full speed across all cores. 2. The "Reference-over-Copy" Strategy In 7NF, because data is immutable, the system avoids the expensive overhead of defensive copying. Memory Efficiency: Instead of duplicating data for different views, the system simply creates new Ordered Pairs of References. Silicon Benefit: Copying memory is one of the most expensive operations for a CPU. By using references over copies, 7NF minimizes memory bus traffic, keeping the L1/L2 caches populated with logic (comparisons) rather than redundant data movement. 3. Stability for ToInt/LongFunction The immutability of the underlying data ensures that the results of the ToInt/LongFunction are also stable. Pre-Computation: The silicon can calculate the primitive comparison value once and cache it. Since the object is immutable, that value never needs to be invalidated or re-calculated. Result: This stabilizes the 1-CPU cycle compare during sorting and joins, as the "weight" of the object is constant. 4. Handling the "Detached Object" This immutability is what makes the 7NF "Detached Object" so powerful. Fact Portability: When an object is detached from the JDBC stream, it remains a valid representation of a moment in time. No Version Conflicts: Unlike traditional ORMs where a detached object might become "stale," a 7NF detached object is always a correct record of an immutable fact. It can be passed between services or threads with zero risk of mutation. 5. Mechanical Sympathy: The Insert-Only Advantage Silicon loves predictability. An Insert-Only (Append-Only) architecture: Maximizes Write Throughput: Writing to the end of a log is the fastest way to interact with modern storage (NVMe/SSD). Simplifies Caching: CPU caches don't need to worry about "Cache Invalidation" due to updates. Once a 7NF mapping object is in the cache, it is valid forever. Conclusion By combining 7NF (Structure), Immutability (Persistence), and Reference-over-Copy (Execution), the 2026 type system achieves the ultimate goal of database design: "Infinite Scale via Immortality" Every object is a safe, immutable reference to a permanent fact, allowing the silicon to focus entirely on the Relational Algebraic Join and 1-cycle sorting, without ever wasting a single clock cycle on the "management of change" ~
5
5 Comments -
Barak Alon
Airbnb • 2K followers
I recently gave a lightning talk at the Data Engineering Open Forum about Airbnb's semantic layer, Minerva: https://lnkd.in/ehq-txmZ The same characteristics that make Minerva scale to hundreds of teams — federated modeling, abstract SQL interface, etc. — are what make it critical to AI analytics. We're seeing huge improvements in accuracy and speed when agents use Minerva. Semantic layers are cool again. --- 🤖 tldr on the talk: Airbnb built a headless semantic layer (Minerva) to standardize how data is modeled and queried across massive, complex systems. By abstracting raw tables into shared metrics and dimensions with a global namespace, it enables seamless multi-source analysis without users needing to understand underlying data structures. Crucially for AI/agentic analytics, the system exposes a SQL interface over this semantic layer—giving agents a well-defined, consistent abstraction to query. Because metrics, relationships, and logic are centralized and machine-readable, agents can generate reliable queries, perform complex multi-fact analysis, and handle “last-mile” transformations without brittle schema assumptions. The key design insight—decoupling facts and dimensions via a “subject” abstraction—removes organizational bottlenecks and allows the data model to scale organically, which is essential when both humans and AI systems are continuously exploring and extending analytics.
52
5 Comments -
Ivan Nardini
Google • 29K followers
Serving LLMs on TPUs with SGLang ! I've been exploring the JAX ecosystem for inference recently and came across SGL-JAX, a high-performance, JAX-based inference engine for LLMs, specifically optimized for Google TPUs. Some highlights: 🌳 The "Radix Tree" KV Cache: SGL-JAX implements a Radix Tree (similar to PagedAttention) to manage memory. This enables efficient prefix sharing for multi-turn chat or agentic workloads where the system prompt remains constant. ⚡ JAX-Native FlashAttention: It integrates a high-performance FlashAttention kernel directly into the JAX compute graph. This is critical for faster, more memory-efficient attention usage, particularly when dealing with long sequence lengths on TPUs. 🧩 Native Tensor Parallelism: It handles the sharding logic natively using JAX distributed primitives. It supports distributing large models (like the Qwen 3 MoE series) across multiple TPU devices without needing complex, manual mesh definitions. ⏳ Continuous Batching: It dynamically schedules incoming requests to maximize TPU core saturation. You aren't waiting for a fixed batch size to fill up while tail latency spikes. 🔧 Drop-in Compatibility: It exposes an OpenAI-compatible API standard. You can literally point your existing LangChain or LlamaIndex setup at the new endpoint and switch hardware backends seamlessly. For more info about the architecture and how it works, check out the repo below! #JAX #MachineLearning #TPU #LLMOps #GoogleCloud #AIEngineering
46
1 Comment -
Milvus, created by Zilliz
14K followers
𝗧𝗵𝗲 𝗯𝗲𝘀𝘁 𝗔𝗜 𝗰𝗼𝗱𝗶𝗻𝗴 𝗹𝗲𝘀𝘀𝗼𝗻 𝗰𝗼𝘀𝘁 𝗼𝘂𝗿 𝗲𝗻𝗴𝗶𝗻𝗲𝗲𝗿 $𝟲𝟬𝟬 𝗮𝗻𝗱 𝗮 𝗺𝗮𝗿𝗿𝗶𝗮𝗴𝗲 𝗮𝗿𝗴𝘂𝗺𝗲𝗻𝘁. Our VP of Engineering, Xiaofan(James) Luan, was supposed to buy his wife a Dior bag for their anniversary. Instead, he bought three Claude Code subscriptions and spent the holiday trying to cross-compile 2 million lines of C++. Every fix on one platform broke two others. $600 later, the only output was "git reset --hard" — and a very cold dinner table.😂 "Make it compile on Windows" is a trap. The real goal was "compile everywhere without hacks" — no AI is going to figure that out for you at 2 am. What worked: constraints before code, review tests not code, bottom-up, one layer at a time. Same task, two days. Then he ran six parallel Claude sessions across three machines with git worktree. The bottleneck stopped being intelligence and started being how fast one person can alt-tab. AI solves exactly the problem you give it. Engineering is in knowing which one to give. His wife is still waiting for that bag. Full story: https://lnkd.in/gtsW_Wvk ——— Follow Milvus, created by Zilliz, for everything related to unstructured data
9
-
Carly Rector
Graphene Software Consulting • 2K followers
This was a good discussion about the Principal Engineer role at Amazon! I often say that the Principal Engineer role as it exists at Amazon simply doesn't make sense at smaller companies. To many, the title just means the next highest level of engineering skills. At Amazon, it requires dealing with high levels of ambiguity, business strategy, and organizational influence. If you're at a company with <50 people, the person with that level of influence should be leading the whole technical organization - it requires a significant amount of scale before it makes sense to have an IC role for it. I also personally appreciated Steve's points on internal level progression - or as Gergeley says, "Why being promoted from Senior to Principal at Amazon is one of the hardest jumps in tech." Steve: "Basically, to get from senior to principal, you have to do like two and a half level jump, from L6 to L7. Technically, it sounds like one level, but at some other companies, this might be like, you know, L8, L9 or L8 and a half. [...] I noticed that some of the best engineers that I'd ever worked with were having such problems getting to principal engineer that they ended up moving to Facebook or to Meta or to all these other places where the progression was sane. Now they're senior staff and, you know, principal and distinguished engineer at other companies. And so because we had high standards, we actually had this brain drain." In addition to the level itself being a big jump the promotion process is notoriously arduous and arbitrary. While I was L6, I had multiple managers over several years acknowledge that I was doing L7 work, and even agree that if I was being hired in from outside I would be hired at L7, and it was the internal promotion process that was the barrier. This was not the *only* reason I left Amazon, but it certainly made it very easy to do. Amazon has lost a lot of skilled engineers this way, but it's a difficult problem to fix, since the incentives are fundamentally badly aligned. If an L6 engineer is already doing the work that's needed, it's very hard for a manager to justify spending the 100+ hours of work it takes to advocate for their promotion. And while there have been attempts to make the process less arduous, no one wants to be seen as "lowering standards." I highly encourage engineers in that position to make it clear they will not be continuing to do work that way indefinitely. And I encourage leaders at other companies to learn from the example! https://lnkd.in/guC35n2B
3
4 Comments -
Ali Ahmad
Sabre Corporation • 9K followers
This is not a headline from a dystopian novel. This is what Amazon actually did 👇 ↳ Two thousand eight hundred and forty seven engineers were given a specific task — document every code pattern, every debugging workflow, every optimisation technique, every piece of institutional knowledge accumulated across years of building Amazon's systems ↳ Eight months of structured knowledge extraction produced a comprehensive training dataset built entirely from the expertise of the people whose roles that dataset would eventually be used to eliminate ↳ The AI was trained on their patterns, their problem solving approaches, their hard won experience navigating Amazon's specific technical environment — and once the training was complete the engineers who provided it were no longer needed 💀 ↳ One senior engineer described it directly — he had literally trained the AI that made him redundant, which is either the most honest summary of what happened or the most uncomfortable one depending on how you read it The process was not accidental or incidental. Extracting institutional knowledge before eliminating the roles that held it is a deliberate sequencing decision. The engineers were not replaced by AI that came from outside. They were replaced by AI built from themselves. Every internal documentation request, every knowledge transfer exercise, every process mapping session happening inside companies right now deserves the same question the Amazon engineers probably wish they had asked sooner. Who is this documentation actually being built for.
-
Jiaqi Xu
Meta • 529 followers
KDD 2026 paper +3 🚀 Excited to share that our team had 3 papers accepted to the 2026 KDD! These works represent recent years of our efforts on large-scale recommendation model training, spanning model co-design, GPU kernel optimization, and compiler optimization across the full training stack 🚀 Kunlun: Establishing Scaling Laws for Massive-Scale Recommendation Systems through Unified Architecture Design RecCompl: Efficient Model Compilation for Industrial Scale Recommendation Models with PyTorch 2 Optimus: A Generic Operator-Level PyTorch Model Transformation Framework The first work introduces Kunlun, our next-generation scaling-up foundation model architecture for GEM (Generative Ads Model). We recently uploaded the paper to arXiv (https://lnkd.in/gKzFSzcF), which also includes our newly open-sourced GDPA kernel work (https://lnkd.in/gpJ879HG) The second and third works focus on our optimizations on PyTorch 2 compiler stack for large-scale recommendation systems. RecCompl shares how we achieved full-model compilation for production recommendation models with PT2. Optimus is a continuation of our operator-level graph optimization work on PT2. We previously shared some of the core ideas in our PyTorch blog. (https://lnkd.in/gBi3fUqJ) Over the past three years, thanks to the amazing effort from the team, we’ve solved many real-world production challenges and successfully launched these optimizations across most Ads models in production. Huge congrats to the entire team — I’m incredibly proud of what we’ve built together. A lot of hard work, late nights, and iteration went into making these systems work in real production environments 💪 Excited to continue pushing large-scale recommendation systems forward and looking forward to sharing more details soon 🔥
139
2 Comments -
Zilliz
25K followers
𝗧𝗵𝗲 𝗯𝗲𝘀𝘁 𝗔𝗜 𝗰𝗼𝗱𝗶𝗻𝗴 𝗹𝗲𝘀𝘀𝗼𝗻 𝗰𝗼𝘀𝘁 𝗼𝘂𝗿 𝗲𝗻𝗴𝗶𝗻𝗲𝗲𝗿 $𝟲𝟬𝟬 𝗮𝗻𝗱 𝗮 𝗺𝗮𝗿𝗿𝗶𝗮𝗴𝗲 𝗮𝗿𝗴𝘂𝗺𝗲𝗻𝘁. Our VP of Engineering, Xiaofan(James) Luan, was supposed to buy his wife a Dior bag for their anniversary. Instead, he bought three Claude Code subscriptions and spent the holiday trying to cross-compile 2 million lines of C++. Every fix on one platform broke two others. $600 later, the only output was "git reset --hard" — and a very cold dinner table.😂 "Make it compile on Windows" is a trap. The real goal was "compile everywhere without hacks" — no AI is going to figure that out for you at 2 am. What worked: constraints before code, review tests not code, bottom-up, one layer at a time. Same task, two days. Then he ran six parallel Claude sessions across three machines with git worktree. The bottleneck stopped being intelligence and started being how fast one person can alt-tab. AI solves exactly the problem you give it. Engineering is in knowing which one to give. His wife is still waiting for that bag. Full story: https://lnkd.in/gtsW_Wvk ——— Follow Milvus, created by Zilliz, for everything related to unstructured data
6
-
Shuo Zheng
ShuAndy Engineering • 209 followers
Why Software Engineering? From Google NYC to Amazon Seattle, I’ve been asked this question at some of the most influential engineering hubs in the world. Personally, I have thought long and hard about my answer. In the end, my response is an equation Software Engineering = Mathematics + Computation + Impact. Honestly, it did NOT have to be a software journey, but I chose this path for a subconscious rationale that I can't explain. On the other hand, I am happy I chose this industry because I have seen the good to come from the industry for the globe. AlphaFold is a true gift to humanity. AWS is a distributed titan of infrastructure. I have learned that many colleagues in the field are asking themselves about how does AI change their perspectives. Today, the barrier to build is lower than ever, which enables everyone to find a place for themselves in this digital age. Perhaps, I need to think more about how I can make a positive difference in the world going forward. Frankly, the sample space of possible responses is infinitely large, but that makes the industry a much more interesting place. #GoogleDeepMind #AmazonInfrastructure
3
Explore top content on LinkedIn
Find curated posts and insights for relevant topics all in one place.
View top contentOthers named Wei Wang in United States
-
Wei Wang
Santa Clara, CA -
Wei WANG
Shanghai, China -
Wei W.
San Francisco Bay Area -
Wei Wang
Calabasas, CA -
Wei Wang
San Francisco Bay Area
1851 others named Wei Wang in United States are on LinkedIn
See others named Wei Wang