Sign in to view Zeph’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Sign in to view Zeph’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Chicago, Illinois, United States
Sign in to view Zeph’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
2K followers
500+ connections
Sign in to view Zeph’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Zeph
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Zeph
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Sign in to view Zeph’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Services
Articles by Zeph
-
One Command Streamlit AWS Fargate Deploy
One Command Streamlit AWS Fargate Deploy
Streamlit is an open-source framework for quickly deploying rich, interactive and blazing fast data-based web…
10
Activity
2K followers
-
Zeph Grunschlag shared thisA recent post is like to shareIssue #46: Distinguish Yourself. An Interview with Zeph G., Senior Engineer, Ex-Algorand TechnologiesIssue #46: Distinguish Yourself. An Interview with Zeph G., Senior Engineer, Ex-Algorand TechnologiesCalyptus
-
Zeph Grunschlag reposted thisZeph Grunschlag reposted thisI see some vocal objections: "Sora is not learning physics, it's just manipulating pixels in 2D". I respectfully disagree with this reductionist view. It's similar to saying "GPT-4 doesn't learn coding, it's just sampling strings". Well, what transformers do is just manipulating a sequence of integers (token IDs). What neural networks do is just manipulating floating numbers. That's not the right argument. Sora's soft physics simulation is an *emergent property* as you scale up text2video training massively. - GPT-4 must learn some form of syntax, semantics, and data structures internally in order to generate executable Python code. GPT-4 does not store Python syntax trees explicitly. - Very similarly, Sora must learn some *implicit* forms of text-to-3D, 3D transformations, ray-traced rendering, and physical rules in order to model the video pixels as accurately as possible. It has to learn concepts of a game engine to satisfy the objective. - If we don't consider interactions, UE5 is a (very sophisticated) process that generates video pixels. Sora is also a process that generates video pixels, but based on end-to-end transformers. They are on the same level of abstraction. - The difference is that UE5 is hand-crafted and precise, but Sora is purely learned through data and "intuitive". Will Sora replace game engine devs? Absolutely not. Its emergent physics understanding is fragile and far from perfect. It still heavily hallucinates things that are incompatible with our physical common sense. It does not yet have a good grasp of object interactions - see the uncanny mistake in the video below. Sora is the GPT-3 moment. Back in 2020, GPT-3 was a pretty bad model that required heavy prompt engineering and babysitting. But it was the first compelling demonstration of in-context learning as an emergent property. Don't fixate on the imperfections of GPT-3. Think about extrapolations to GPT-4 in the near future.
-
Zeph Grunschlag reacted on thisFull Encoder-Decoder as a pair of 4-story and 5 1/2- story buildings. Listen to Kirill Eremenko ‘s brilliant exposition.Super Data Science: ML & AI Podcast with Jon Krohn: 759: Full Encoder-Decoder Transformers Fully Explained, with Kirill Eremenko on Apple PodcastsSuper Data Science: ML & AI Podcast with Jon Krohn: 759: Full Encoder-Decoder Transformers Fully Explained, with Kirill Eremenko on Apple Podcasts
-
Zeph Grunschlag shared thisFull Encoder-Decoder as a pair of 4-story and 5 1/2- story buildings. Listen to Kirill Eremenko ‘s brilliant exposition. https://lnkd.in/gtm4EcjrSuper Data Science: ML & AI Podcast with Jon Krohn: 759: Full Encoder-Decoder Transformers Fully Explained, with Kirill Eremenko on Apple PodcastsSuper Data Science: ML & AI Podcast with Jon Krohn: 759: Full Encoder-Decoder Transformers Fully Explained, with Kirill Eremenko on Apple Podcasts
-
Zeph Grunschlag reposted thisSora is essentially a world model with “no-op” as the only allowed action. You can set the initial states of the world, run simulation in latent space, and observe what happens passively. No way to do active intervention now.Zeph Grunschlag reposted thisLots of confusion about what a world model is. Here is my definition: Given: - an observation x(t) - a previous estimate of the state of the world s(t) - an action proposal a(t) - a latent variable proposal z(t) A world model computes: - representation: h(t) = Enc(x(t)) - prediction: s(t+1) = Pred( h(t), s(t), z(t), a(t) ) Where - Enc() is an encoder (a trainable deterministic function, e.g. a neural net) - Pred() is a hidden state predictor (also a trainable deterministic function). - the latent variable z(t) represents the unknown information that would allow us to predict exactly what happens. It must be sampled from a distribution or or varied over a set. It parameterizes the set (or distribution) of plausible predictions. The trick is to train the entire thing from observation triplets (x(t),a(t),x(t+1)) while preventing the Encoder from collapsing to a trivial solution on which it ignores the input. Auto-regressive generative models (such as LLMs) are a simplified special case in which 1. the Encoder is the identity function: h(t) = x(t), 2. the state is a window of past inputs 3. there is no action variable a(t) 4. x(t) is discrete 5. the Predictor computes a distribution over outcomes for x(t+1) and uses the latent z(t) to select one value from that distribution. The equations reduce to: s(t) = [x(t),x(t-1),...x(t-k)] x(t+1) = Pred( s(t), z(t) ) There is no collapse issue in that case.
-
Zeph Grunschlag reacted on thisLearn about a great way to version your Fast API’s and Stanislav Zmiev‘s cadwyn. https://lnkd.in/gp3DwGfS https://lnkd.in/gurDjs5T
-
Zeph Grunschlag reposted thisZeph Grunschlag reposted thisSlow python installs with pip? The new uv tool from Astral is a drop-in replacement for pip written in Rust. Definitely worth trying out - for my CI pipelines it has reduced build times from 80s to 30s. Great! Now I just need the tests to pass😜GitHub - astral-sh/uv: An extremely fast Python package and project manager, written in Rust.GitHub - astral-sh/uv: An extremely fast Python package and project manager, written in Rust.
-
Zeph Grunschlag reposted thisZeph Grunschlag reposted thisTwo more days left for Early Bird registration for PG Day Chicago! Do not miss your chance - we want to see you on April 26! https://lnkd.in/gmQHQmVE
-
Zeph Grunschlag liked thisZeph Grunschlag liked thisCrypto trading is coming to E*TRADE. Don't miss out - join the list and get access to exclusive crypto content and education. https://mgstn.ly/3PrWX6c
-
Zeph Grunschlag liked thisAnnounced today! A good day to thank all the people I have worked with: teachers, students, colleagues, etc.!Zeph Grunschlag liked thisCongratulations to three Columbia Engineering alumni on their election to the National Academy of Engineering (NAE), one of the highest professional honors for engineers. ⚒️James Scapa BS’78 — Founder and CEO Emeritus, Altair Engineering; Columbia University Trustee ⚒️Tom Scarangello PhD’87 — Senior Advisor and former Executive Chair, Thornton Tomasetti, Inc. ⚒️Moti Yung PhD’88 — Distinguished Research Scientist, Google; Adjunct Senior Research Scientist at Columbia University Election to the NAE recognizes outstanding contributions to engineering research, practice, and education. Read the announcement: https://lnkd.in/evCjKbSD Image (left to right): Scapa, Scarangello, and Yung
-
Zeph Grunschlag liked thisZeph Grunschlag liked thisAnd now for something completely different: we published a children's book. Introducing "Stablecoins for Babies," written by zerohash Founder & CEO Edward Woodford. It's not the typical product launch you'd expect from us, but we believe it reflects a deeper truth about where the world is heading. Stablecoins are quickly becoming part of everyday financial life. Industry estimates project $46 trillion in stablecoin volume in 2025, reflecting an 87% year-over-year increase, per Andreessen Horowitz. Across our ecosystem, banks, brokerages, and fintechs are adopting stablecons because they move value faster, more reliably, and more efficiently. As digital money becomes mainstream infrastructure, financial literacy becomes just as important as the technology itself. This book brings those ideas to life through "Steady Eddy," a friendly stablecoin who helps kids understand how digital dollars work in a simple, approachable way. It also supports a meaningful cause: all net proceeds will be donated to Reading Is Fundamental, the nation's largest children’s literacy organization. "Stablecoins for Babies" is available to order via the zerohash website, Amazon, and other participating retailers. Purchase the book: https://lnkd.in/eMZavEU4Stablecoins for Babies | By zerohash Founder & CEO Edward WoodfordStablecoins for Babies | By zerohash Founder & CEO Edward Woodford
-
Zeph Grunschlag liked thisZeph Grunschlag liked thisOctober 10, 2025 was a day where a decentralized algorithmic system built to replace central clearing called autodeleveraging appeared to have failed massively, leading to >$650 million of over-liquidated profits. Does this mean we need to replace it with centralized clearing? In this post, which summarizes my new paper, Autodeleveraging: Impossibilities and Optimization (https://lnkd.in/eX5nA4Fy), you can hopefully find some solace in understanding that there is no perfect solution, but there are definitely better solutions than what we are doing nowAutodeleveraging: $653 million lost to a heuristic?Autodeleveraging: $653 million lost to a heuristic?Tarun Chitra
-
Zeph Grunschlag liked thisZeph Grunschlag liked this🎲 👋 If you enjoy Monopoly Deal, here’s something you might like: I recently put a paper on arXiv — “Monopoly Deal: A Benchmark Environment for Bounded One-Sided Response Games.” Card games have long served as benchmarks for sequential decision-making under uncertainty — from Poker and Hanabi to Magic: The Gathering. This paper introduces a version of Monopoly Deal tailored to study a less-explored interaction pattern where an action briefly hands control to the opponent for a bounded sequence of responses. We show that standard Counterfactual Regret Minimization (CFR) fits naturally into this setting, and we provide a lightweight research environment that bundles the game engine, a parallel CFR runtime, and a web interface — all runnable on a single workstation. 🧩 Paper, demo, and code: 📜 https://lnkd.in/eDsWMVts 🎉 https://monopolydeal.ai 💻 https://lnkd.in/eEHJ2nYe Would love to connect with others curious about this direction or about sequential decision-making under uncertainty more broadly. Feel free to reach out!GitHub - cavaunpeu/monopoly-deal-ai: Systems and algorithms for (machine-)learning Monopoly DealGitHub - cavaunpeu/monopoly-deal-ai: Systems and algorithms for (machine-)learning Monopoly Deal
-
Zeph Grunschlag liked thisZeph Grunschlag liked this🚨 👋 Excited to share a paper I recently put on arXiv: “Monopoly Deal: A Benchmark Environment for Bounded One-Sided Response Games.” Card games have long served as benchmarks for sequential decision-making under uncertainty — from Poker and Hanabi to Magic: The Gathering. This paper introduces a version of Monopoly Deal designed to study a less-explored interaction pattern, where an action briefly transfers control to the opponent for a bounded sequence of responses. We show that standard Counterfactual Regret Minimization (CFR) adapts cleanly to this structure, and share a lightweight research environment that integrates the game engine, parallel CFR runtime, and web-based interface — all runnable on a single workstation. 🧩 Paper, demo, and code: 📜 https://lnkd.in/eDsWMVts 🎉 https://monopolydeal.ai 💻 https://lnkd.in/eEHJ2nYe Would love to connect with more folks interested in this type of work, and sequential decision-making under uncertainty in general. Feel free to reach out!
-
Zeph Grunschlag liked thisLooking forward to speaking on DeFi vaults in Buenos Aires!Zeph Grunschlag liked this🎙️ Jacob Kim, Data Science & Engineering Manager, Gauntlet Jacob joins the lineup for Vault Summit! Meet him at Ciudad Cultural Konex in Buenos Aires during DeFiConnect on 18 November. 🎟️ tickets.deficonnect.co
-
Zeph Grunschlag liked thisZeph Grunschlag liked thisThe new ChatGPT Atlas by OpenAI is very impressive and amazing. It inverts the pattern of websites with a chat add-on and makes it chat with a website add-on. All your browsing, all your workflows, all your context is immediately available to the chat. It can let you understand that news article you're reading the same way Grok is helpful on X. It can integrate with that PRD and push it into Linear. It can help you make sense of a multi-step flow where you visited 10 pages in a row and now have a question. There are many dystopian views on this too and data privacy concerns. I'm not really ready to use it as more than a toy. But, I'm sure many won't care about the privacy concerns and jump right in and get a lot of value out of it. Where I'm curious to see how it goes is with the co-pilot integration into microsoft suite of products. If they have this in IE Edge tightly integrated with co-pilot, then this is the browser of the future for enterprise. It's a unifying substrate. What they did was obviously coming. And, it's a simple thing. But, that should not undermine how impressive it is. They had to build everything else just to get to the point where they could do the obvious and simple thing and make it look easy. This launch has my wheels already spinning on what the future of integrated AI looks like with websites and applications. Are your wheels spinning now too?
Projects
-
Streamlit on Fargate via CDK
-
Languages
-
English
Native or bilingual proficiency
-
Hebrew
Native or bilingual proficiency
Recommendations received
1 person has recommended Zeph
Join now to viewView Zeph’s full profile
-
See who you know in common
-
Get introduced
-
Contact Zeph directly
Other similar profiles
Explore more posts
-
Jamie Potter
Flexciton • 15K followers
Some yield investigations don’t fail because engineers aren’t smart. They fail because the data stops at the factory gate. 🔒 Here’s a scenario I keep hearing about. A backend site flags an issue on a subset of units. It’s real. It’s repeatable. And it’s not being caused in the backend factory. So now the uncomfortable questions begin: ❓ What actually caused this upstream? ❓ Is it isolated to one wafer, one lot… or is it systemic? ❓ Do we have the same signature elsewhere, sitting in WIP or already shipped? In theory, this should be straightforward. The “bad” die exists in the backend. The corresponding wafer ran through a front-end fab with a rich trail of history: tools, steps, holds, rework, metrology, inspection, and operator interventions. But in practice? That front-end context is often invisible. Even within the same company, data is frequently fragmented across systems, formats, and teams. And if the supply chain spans multiple companies, it gets worse: commercial boundaries, IP sensitivity, and inconsistent identifiers make cross-site traceability painfully slow. The result is an industry-wide pattern: We spend days correlating what should take minutes. Meantime, the business impact compounds: ➡️ conservative containment actions ➡️ delayed shipments ➡️ extra screening ➡️ and, too often, “best guess” decisions made under time pressure 🔥 This is why I believe one of the biggest untapped opportunities in semiconductor manufacturing isn’t another local optimisation inside one fab. It’s correlating data across sites — so a backend failure can immediately be linked to the upstream genealogy of that exact wafer/die, across the chain. Not as a massive data dump, but as a targeted, permissioned way to answer one question fast: “What happened upstream that explains what we’re seeing downstream?” 🧩 We’ve built incredible sophistication inside fabs. Now we need to extend that same mindset across the supply chain. What’s the biggest blocker in your experience to sharing the right data across sites — IP concerns, lack of standards, or misaligned incentives?
19
1 Comment -
Matt Stockton
PragmaNexus • 2K followers
I used Claude Code to split a 500+ page PDF into dozens of separate documents for a client project. The PDF contained merged files - some scanned, some text-based - all concatenated together. I needed to find where each document ended and the next began, then split at those boundaries. Here's what I did: - I converted the PDF to text using a document processing library, then pointed Claude Code at the file - I described the patterns I was looking for in plain language ("pages that say 'Page 1 of'" or "sections starting with APPENDIX") - The model turned my descriptions into grep commands and regular expressions - I didn't need to know the syntax, I just described what I was looking for - It searched the document, found the boundaries, and wrote Python code to split the PDF at those page numbers The model didn't load the entire document into context. It searched the file using command line tools and pulled back just what it needed. This is tool calling combined with file access - the model runs commands directly on your computer instead of you uploading files to a web interface. Once I got it working, I saved it as a slash command to reuse on similar documents. You can also use the Claude Agent SDK to fully automate workflows like this. The same approach works for parsing meeting transcripts, categorizing support tickets, or extracting data from receipts. Anything where you'd describe a search algorithm to a person and have them apply it across many files on your computer. What file tasks are you still doing by hand that might fit this pattern? Full post with more details on the workflow / tool calling in general, how to let the model find patterns for you, and how to test your results: https://lnkd.in/g5A7JZ7N
18
5 Comments -
Christina Qi
Databento • 58K followers
We process 14M messages/sec with sub-100μs latency. When rewriting our feed handler, Rust seemed like the obvious choice - we use it successfully across our stack... We chose C++ instead. I'm re-sharing our blog post on the technical choices behind that decision.
281
13 Comments -
Steve Declercq
Bizzy • 6K followers
If a team of 15 can suddenly perform like 150… What happens to the org chart? Traditional companies stack layers: ICs → Managers → Senior Managers → Directors → VPs → C-Level. Each layer exists to: → Coordinate → Aggregate information → Translate context upward → Align execution downward But when AI handles coordination automatically? When data flows in real-time without someone summarizing it? When decision briefs write themselves? Those middle layers lose their structural purpose. I'm not saying leadership disappears. I'm saying organizations get radically flatter. The comparison: A 150-person company looks like: → 130 highly ICs → 20 strategic leaders A 15-person company looks like: → 13 highly ICs → 2 strategic leaders → AI filling every coordination gap in between The real threat isn't AI replacing you. It's a 15-person team with AI moving faster than your 150-person team without it. Exciting times. But also uncomfortable ones.
39
4 Comments -
Jeffrey Emanuel
JeffCo Industries LLC • 3K followers
Agent coding life hack: I’m 100% convinced that there are hundreds of thousands of developers out there who would love and use my dcg tool if they only knew about it. dcg: destructive_command_guard This is a free, open-source, highly-optimized rust program that runs using pre-tool hooks in Claude Code (CC) and checks the tool call that CC was about to make to see if it’s potentially destructive; that is, could delete data, lose work, drop tables, etc. Get it here and install with the convenient one-liner: https://lnkd.in/ePbmbS4f A tool like dcg has several competing goals that make it a careful balancing act and tough engineering problem: 1. Since it runs for every single tool call, it must be FAST. Hence why it is written in Rust and an extreme amount of focus has been placed on making it as fast as possible. 2. It must avoid annoying false positives that waste your time, add friction, and re-introduce you as the bottleneck unnecessarily. I run dozens of agents at once and don’t want them wasting time waiting for me unless it’s needed. Usually, the messages from dcg are enough to get the agent to be more thoughtful about what it’s doing. 3. It’s not enough to just use a simple rulebook where you look for canned commands like “rm -rf /” or “git reset --hard HEAD.” The models are very resourceful and will use ad-hoc Python or bash scripts or many other ways to get around simple-minded limitations. That’s why dcg has a very elaborate, ast-grep powered layer that kicks in when it detects an ad-hoc (“heredoc”) script. But wherever possible, it uses much faster simd optimized regex. 4. A tool like this should really be expandable and have semantic knowledge of various domains and what constitutes a destructive act in those domains. For instance, if you’re working with s3 buckets on aws, you could have a highly destructive command that doesn’t look like a normal delete. That’s why dcg comes out of the box with around 50 presets which can be easily enabled based on your projects’ tech stacks (just ask CC to figure out which packs to turn on for you by analyzing your projects directory). 5. dcg is designed to be very agent friendly. It doesn’t just block commands, it explains why and offers safe alternatives based on an analysis of the specific command used by the agent. For instance, it might stop the agent from deleting your Rust project’s build directories but suggest using “cargo clean” instead. Often, these messages are enough to knock sense into Claude. I really can’t exaggerate just how much time and frustration dcg has already saved me. It should be known and used by everyone who has had these kinds of upsetting experiences with coding agents. dcg is included along with all my other tooling in my agent-flywheel.com project. All free, MIT licensed, with extensive tutorials and other educational resources for people with less experience. Give it a try, you won’t regret it!
79
13 Comments -
Jeremy Tian
Lucidic AI • 7K followers
We’ve spent $3,000+ on Claude Code and it’s written 200,000+ lines across our repo. Here’s what actually worked (and what didn’t). For the longest time, I was just using Claude Code on “auto-accept edits” mode. This was my biggest mistake. These are some things that worked better for us and things in general I just didn’t know about: 1. USE PLAN MODE. Always ask Claude to outline the change before writing code. It genuinely leads to much better thought out and functional code. This is the #1 thing that has improved our code quality. 2. Be specific and thoughtful. Claude isn’t “bad”. Most failures were my prompts. It only knows what you tell it. You have way more context than your prompt carries. Treat it like a smart intern who codes well but doesn’t know your codebase yet. Over-communicate intent, constraints, and edge cases. Anticipate misunderstandings/failure modes and preempt them in the prompt. When it proposes a plan, ask it to critique the plan before writing code. 3. Mention files with @. Somehow, I didn’t know this at first. Makes it waste a lot less time and is really important for keeping it focused on the right files. 4. Use multiple terminals. I started with one. Now I keep 2–3 running on the same branch across different parts of the codebase. People talk about 10x engineers. This literally allows you to work on 2-3x the features concurrently (or more if you can handle it). 5. Thinking levels. I didn’t know you can literally tell Claude to think harder and that will actually make it spend more time and reasoning thinking. Use the keywords: "think" < "think hard" < "think harder" < "ultrathink." Anthropic wrote a great deep dive here: https://lnkd.in/dKU45y3H What are the most helpful things you’ve learned from using Claude Code? p.s. peep Timmy 👀
135
28 Comments -
Tobias (Toby) Mao
Fivetran • 14K followers
dbt Labs announcement of #dbt (con)Fusion has a lot of things that we as a data community need to discuss. dbt Fusion is a complete rewrite of dbt Core in Rust. Unlike dbt Core, which is completely free and open source under Apache 2.0, dbt Fusion is not open source as it is under the more restrictive Elastic 2.0 license. Although Fusion is free to use, it restricts its usage in a hosted or managed offering to 3rd parties. You may think this is fine, but there are far-reaching implications. Open source is amazing because it incentivizes individuals and companies to invest in it without risk. A company can go all in on open source because no matter what the direction of the core maintainers is, it can be forked and used for whatever reason you want. A non-permissive license like Elastic will disincentivize companies from investing. Don't get me wrong, there's nothing ethically wrong about dbt Labs' decision. It may even be in their best financial interest to do so. However, I want to analyze what led to this situation and what it might mean for the future of dbt Core. What I believe this means is that dbt's strategy is to put dbt Core in "maintenance mode" to focus on Fusion and their other proprietary offerings. The wording of the announcement was very carefully selected to be vague. In particular, when referring to dbt Core support, it was only highlighted that bug fixes, security patches, and compatibility would be ongoing. According to their dbt Core roadmap, they've separated out the dbt language from the runtime. There's a specific callout that Fusion and Core will inevitably diverge because Fusion has additional capabilities that cannot be added to Core. To me, it makes sense that this is the chance for dbt Labs to invest in more restrictive and profitable software while slowly deprecating what not only made them great but is also their biggest challenge to financial growth. Ultimately, resources are finite, and companies must prioritize what makes sense for the business. Given dbt Core's foundational importance to modern data infrastructure, Analytics Engineers deserve a free, open, and continually evolving transformation platform. Otherwise, your career will be dangerously dependent on the decisions of a single company. To safeguard the continuing innovation and development of the transformation space, it may be time to start a discussion about an open standard for defining data transformations.
795
49 Comments -
Ritvik Pandey
Pulse • 15K followers
Today the Pulse team published a deep dive on why a single “accuracy” score doesn’t tell you if a document extraction system will survive in production. The goal here is to lay out an introductory but still rigorous evaluation methodology - we have an exciting open-source benchmark building on this research coming out very soon. Let’s do the math: take 1,000 pages, each with 200 data elements. A model that’s 98% “accurate” on paper still produces 4,000 incorrect values. Now make some of those: 1/ Broken reading order that scrambles multi-column layouts 2/ Tables with shifted columns or missing headers 3/ Cross-page context lost entirely That’s enough to silently corrupt an entire dataset without throwing a single error. We’ve processed hundreds of millions of pages and built a multi-axis evaluation framework to measure what actually matters: reading order validation, region-level ANLS, reading order accuracy, TEDS for table structure, and continuity checks across page boundaries. The result? Fewer silent data corruptions, more predictable performance, and pipelines that keep working on the next million documents you haven’t seen yet. Full technical write up in the comments!
36
2 Comments -
Rob Manson
Rob is the original… • 1K followers
It's Friday and flo is on fire! Finished the week with big updates... Added support for Claude Code as an LLM provider. If you have a Claude Pro or Max plan, your hub agents can now route through the Claude Code CLI - meaning you get to use your existing plan instead of paying per token. Just point your hub at your local Claude Code install and it handles the rest (see the guide in the github repo). Added Gemini 3.1 Pro Preview support. Google released it yesterday and it's already the default Gemini model in flo. Also fixed a bug with persisting Geimini agents to the hub - turned out Google's API returns different line endings than everyone else and our server-side parser wasn't handling it. Rewrote the entire system prompt and all six built-in system skills. Tested scheduling tasks across four models (Opus 4.6, Sonnet 4.6, GPT-5.2, Gemini 3 Pro) and found that models were failing because critical platform knowledge was buried in optional reference docs with terse formatting. The rewrite uses narrative flow, explicit "unlearn" callouts for web dev assumptions that don't apply, and anti-pattern sections. Result - all four models now complete complex test tasks on the first try. Also added server-side validation that gives models actionable error messages instead of opaque crashes - so when a model does make a mistake, it self-corrects instead of spiraling. It's all live on https://flo.monster and on github at https://lnkd.in/gSx7AyFZ
16
2 Comments -
Elvis S.
DAIR.AI • 85K followers
Another great paper if you are building with coding agents. (great insights on this one; bookmark it) This reminds be a bit of the recently released agent teams in Claude Code. Why it matters? Single-agent coding systems have hit a ceiling most devs don't talk about. The default approach to building AI coding agents today is a single model responsible for everything: understanding issues, navigating code, writing patches, and verifying correctness. But real software engineering has never been a solo activity. This new research introduces Agyn, an open-source multi-agent platform that models software engineering as a team-based organizational process rather than a monolithic task. The system configures a team of four specialized agents: a manager, researcher, engineer, and reviewer. Each operates within its own isolated sandbox with role-specific tools, prompts, and language model configurations. The manager agent coordinates dynamically based on intermediate outcomes rather than following a fixed pipeline. What makes the design interesting? Different agents use different models depending on their role. The manager and researcher run on GPT-5 for stronger reasoning and broader context. The engineer and reviewer use GPT-5-Codex, a smaller code-specialized model optimized for iterative implementation and debugging. This mirrors how real teams allocate resources based on task requirements. The workflow follows a GitHub-native process. Agents analyze issues, create pull requests, conduct inline code reviews, and iterate through revision cycles until the reviewer explicitly approves. No human intervention at any point. The number of steps isn't predetermined. It emerges from task complexity. Here is one notable finding: Starting agents from empty environments proved more effective than preconfigured setups. Agents use Nix to install dependencies as needed, avoiding implicit assumptions that conflict with project-specific requirements. When command outputs exceed 50,000 tokens, they're automatically redirected to files rather than overwhelming the model context. On SWE-bench 500, the system resolves 72.4% of tasks, outperforming single-agent baselines using comparable model configurations. OpenHands + GPT-5 achieves 71.8%, and mini-SWE-agent + GPT-5 reaches 65.0%. Importantly, the system was designed for production use and was not tuned for the benchmark. Organizational structure and coordination design can be as important for autonomous software engineering as improvements in underlying models. Teams of specialized agents with clear roles, isolated workspaces, and structured communication outperform monolithic approaches even with comparable compute.
73
3 Comments -
Mohammad Syed
Ayfa Consultants Inc • 9K followers
The GPU wasn't built for AI. It was built for video games. Most teams don't understand what they're buying. They default to GPU for everything. That default is costing them 3× what the workload requires. ━━━━━━━━━━━━━━━━━━━━━━ 🔷 𝗧𝗛𝗥𝗘𝗘 𝗖𝗛𝗜𝗣𝗦. 𝗢𝗡𝗘 𝗙𝗥𝗔𝗠𝗘𝗪𝗢𝗥𝗞: 𝗖𝗣𝗨 - Sequential logic. Brilliant at branching decisions. Use it for: Orchestration. Routing. Control flow. 𝗚𝗣𝗨 - Thousands of parallel cores. Built for pixels. Repurposed for intelligence. Use it for: Training. Experimentation. Flexibility wins here. 𝗧𝗣𝗨 - Built for one thing. Matrix multiplication. Nothing else. Use it for: Inference at scale. Efficiency wins here. The mistake isn't picking the wrong chip. It's using one chip for all three. 🔖 Screenshot this. Run your current workloads against it. ━━━━━━━━━━━━━━━━━━━━━━ ⚡ 𝗧𝗪𝗢 𝗧𝗘𝗔𝗠𝗦. 𝗦𝗔𝗠𝗘 𝗠𝗢𝗗𝗘𝗟: Team A: GPU for everything. $2.1M monthly inference costs. Team B: TPU for inference. GPU for training. $700K monthly. Same output. Same model. Different chip strategy. ━━━━━━━━━━━━━━━━━━━━━━ 🔴 𝗧𝗛𝗘 𝗕𝗥𝗨𝗧𝗔𝗟 𝗧𝗥𝗨𝗧𝗛: Most teams don't have a chip strategy. They have a chip default. GPU for everything. Because it's what they know. That's not architecture. That's inertia. ━━━━━━━━━━━━━━━━━━━━━━ Which chip is running your inference? G = GPU (chose it) D = GPU (defaulted to it) T = TPU ? = Don't know Drop your letter. __________ 🔖 Save this before your next infrastructure review ♻️ Repost if your team is defaulting to GPU for everything ➕ Follow Mohammad Syed for AI & Cybersecurity insights
194
117 Comments -
Ash Kaduskar
First Citizens Bank • 2K followers
Yesterday at the conference, someone asked me: “Which LLM should banks standardize on?” It’s a fair question. But maybe it’s a question we need to reframe. Perplexity’s new system reportedly orchestrates 19 models behind the scenes. Nineteen. Claude for orchestration. Gemini for research. Grok for speed. Veo for video. ChatGPT for long-context recall. That’s not just model selection. That’s architectural thinking. In regulated industries, it’s tempting to simplify the discussion to one model, one contract, one answer. I understand why — it feels cleaner. But enterprise AI won’t stay that simple. The differentiator will be the ability to route the right task to the right model — with governance, auditability, and control built in from day one. And just as importantly, your architecture needs to be scalable enough to absorb the next wave of breakthroughs — without having to redesign everything from scratch. Architecture > Model.
115
8 Comments -
Deepak S.
4K followers
Workflow Before Code: The Foundation of Fund Admin Innovation The most valuable AI question isn't "What technology can we build?" but "Which broken workflows are costing you millions?" At DwellFi , we didn't follow with conventional wisdom of starting with infrastructure. Instead, we observed fund administrators struggling with capital calls until midnight and reconciliations that consumed entire weeks. Our radical approach? Map the workflows first, then build the technology. This isn't just implementation strategy—it's a philosophy that's transforming fund administration. When we deeply understand a process before coding a solution, the results speak volumes: capital calls completed in hours instead of days, document accuracy jumping from 82% to 99.4%. The future belongs to those who apply AI to solve the right problems, not those who build the most sophisticated models. What broken workflows would you fix first if technology wasn't the constraint? I'm curious to hear your priorities. #ThoughtLeadership #FundAdmin #WorkflowInnovation
16
Explore top content on LinkedIn
Find curated posts and insights for relevant topics all in one place.
View top content