Sign in to view Mitchell’s full profile
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Sign in to view Mitchell’s full profile
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
San Francisco, California, United States
Sign in to view Mitchell’s full profile
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
2K followers
500+ connections
Sign in to view Mitchell’s full profile
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Mitchell
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Mitchell
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Sign in to view Mitchell’s full profile
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Experience & Education
-
Phantom
***** ******** ******** * *******
-
******
********** * *** ********* ** **********
-
******
******** ********
-
* **********
****** ***** ****** undefined undefined
-
-
********** ** ********
********** ****** ************* ********* *** ********** ***********
-
View Mitchell’s full experience
See their title, tenure and more.
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Honors & Awards
-
Governor General's Award
Rideau Hall
Awarded to the student with the highest overall average in his/her
secondary school graduating class. -
Canadian Engineering Competition Champion
-
First overall at the national level (CEC) for junior design
-
Sandford Fleming Foundation Senior Design Competition Award
-
Sandford Fleming Foundation & University of Waterloo
View Mitchell’s full profile
-
See who you know in common
-
Get introduced
-
Contact Mitchell directly
Other similar profiles
Explore more posts
-
Aleksa Gordić
stealth • 112K followers
RL is all you need! 🍭🧠 Deepseek just released OpenAI O-1 competitor model (DeepSeek-R1 & DeepSeek-R1-Zero) with an MIT license!! Their Zero model is just base model (DeepSeek-v3-base) + large scale RL without SFT! It achieves amazing reasoning capabilities. Note: RL = GRPO (not PPO). BUT - they encountered some issues with it: * poor readability * language mixing (sounds like some cracked technical talent I knew that was poor at communication but great at problem solving) DeepSeek-R1 is basically DeepSeek-R1-Zero + multi-stage training This is how their multi-stage pipeline looks like: 1. Start with thousands of cold-start CoT data to fine-tune the base 2. RL stage similar to Zero 3. new SFT through rejection sampling data + supervised data (writing, self-cognition, etc.) -> ~600k data points 4. RL again - to make the model harmless/helpful, etc. The length of the response increases during the training as an emergent property. Reflection, exploration of alternatives, and other approaches simply emerge without being programmed for. Their RL mainly uses 2 rewards: * accuracy rewards (e.g. unit tests for code can be used to compute accuracy) * format rewards (gets rewarded if it separates the "thinking" and the "answer" parts by <think> tags, the reward that forces it to use the same language consistently without switching back & forth) Interesating enough no outcome/process RMs were involved! So their pipeline is really simplified. They also release 6 dense models (1.5B - 70B range) distilled from DeepSeek-R1 based on Qwen/Llama. I definitely didn't expect Chinese to be leading in the open-source AI. Big congrats to DeepSeek team!
644
16 Comments -
Ashu Garg
Foundation Capital • 38K followers
Investment is moving from pre-training to post-training. Early LLM budgets focused on training ever-larger base models. Post-deployment, models were mostly fixed... improvements came via scaling next-gen models. Now labs split resources between pre-training and RL fine-tuning with careful inference optimization. In post-training, models learn reasoning, follow prefs and refine outputs for specific tasks. Models update more frequently, weekly or monthly vs yearly. Each iteration collects new data, extends RL training and adjusts rewards to improve reasoning. New economic models charge for updated reasoning engines vs static models. Serving costs and latency are key so providers favor efficient pro models with most accuracy at lower compute. 3 more takeaways from ICML: 1️⃣ RL is only beginning to cross into “unverifiable” domains. Traditional RL used tasks with clear auto checks (like code compilers or math calcs) for rewards. Domains like Math Olympiad or legal arguments have complex solutions that can't be auto-verified. These need more complex reward models scoring reasoning steps, argument clarity or persuasiveness vs just correctness. With solid reward models, RL could teach coherent proofs, experiment design or legal strategies. This is still early work - current systems often break tasks into verifiable parts or use human prefs, but broader paths are emerging. 2️⃣ Personalization is a safer near-term goal than true continual learning. Interest in adapting models to users is growing with 2 key approaches. Personalization tailors outputs by learning user-specific rewards from few feedbacks or adding curiosity rewards that prompt questions about user tastes. These adjust behavior without changing model weights, improving perceived helpfulness and empathy. Continual learning updates parameters in real time from all user input. But continual learning poses safety and privacy risks like overfitting, bias amp & data leakage. Personalization and context windows (remembering interactions without weight changes) are more practical and responsible. 3️⃣ RL scaling will follow multiple paths. → One path applies current RL methods (PPO, guided reward models, diffusion reg) to more domains. This incremental path needs no new algos, just better rewards, bigger varied datasets & optimization tweaks. → The second tackles sparse-reward probs with long feedback delays. Success depends on credit assignment in long seqs and off-policy learning using varied experiences. Off-policy RL supports multi-datacenter training with clusters handling acting, collection and learning. → The third involves continual learning with frequent updates using user feedback and new data. Each has trade-offs: incremental and safe, risky but domain-expanding, adaptive but complex w/ safety issues. What'd I miss?
49
5 Comments -
Hilik Paz
arato.ai • 3K followers
A tiny LLM just beat the giants at their own game. In a controlled evaluation test this summer, gpt-5-nano consistently picked the right answer while Claude-Sonnet-4 and Gemini 2.5 Pro fell for a reasoning trap. The smaller model not only avoided the distraction of polished but wrong explanations, it also explained its choices clearly. Bigger models, more parameters, more cost, yet worse results. If you rely on a single “trusted” LLM as a judge, you’re exposed to systematic bias. Subtle prompt design can swing outcomes. Alignment matters more than parameter count. And stacking multiple evaluators, not just scaling one, is the safer path. At Arato, we help teams plan, test, and run evaluation pipelines that reflect this reality, integrating diverse LLM judges, tracking bias, and ensuring results hold up in production.
256
4 Comments -
Vaibhav Gupta
Boundary • 7K followers
We made a huge bet that engineers will not want to be forced into using python just to use LLMs. The most underrated thing about BAML isn't that we improve tool calling on every model (including gpt-5), or that we have the fastest iteration loop for testing prompts. Its the fact that as your team scales and inevitebly uses different languages, they can ALL still use BAML to write their AI pipelines. Python devs love BAML. Typescript devs love BAML, and now we're starting to see Go devs on it too!
62
2 Comments -
Dhruv Mehra
Pype • 4K followers
🚀 Launch Alert: Pype is Now Open to All AI Builders! The biggest cost of AI agents isn't the LLM bills - it's the cost of their mistakes. We learned this working with pioneering companies who were burning countless engineering hours managing their AI agents. What started as a hypothesis became hard reality. Working with amazing beta partners globally, we saw teams struggling with: - AI agents making expensive API calls without limits - Engineers pulling all-nighters to debug agent behavior - Production mistakes costing real money and time The result? We helped our partners save $2M in their AI automation processes. What surprised us most was seeing how creatively companies are using AI agents. It made one thing clear - the AI agents are here to stay. And tooling around them is a necessity. 🎯 What Pype Does: - Debug agents faster with intelligent tracing (not just dumb logs!) - Provides robust guardrails for AI agents - Helps you battle-test your agents - Cuts engineering bandwidth waste by 40% Are you an engineer who's already built an AI agent? Let's take your journey from 1 → 100 together. Drop a comment or DM - I'd love to hear your story. Or, try Pype for free - link in comments below 👇 #AIAgents #TechLaunch #Engineering #StartupLife #LLM #Evals
167
29 Comments -
Rishi Saraf
Refacto AI • 8K followers
If you think Deepseek is just about easy GPU access or having loads of money, you're hallucinating. It’s the result of years of investment in research, math, AI, labs, and—most importantly—a culture of hard work. Yeah, those 70-hour work weeks too, the ones people love to mock. Before Deepseek, China had already positioned itself as the world's factory. So calm down—no amount of government perks or VC money creates breakthroughs overnight. Real change starts at the grassroots. We need to move past the Jugaad mindset and push our kids toward research, sports, and actual problem-solving. Teach them to embrace the grind instead of chasing quick gratification—whether it's a high package or an MNC job.
80
7 Comments -
Akash Sharma
Vellum • 15K followers
Introducing native tool support in Vellum for complex agentic workflows. Today, we’re launching 𝗧𝗼𝗼𝗹 𝗖𝗮𝗹𝗹𝗶𝗻𝗴 𝗡𝗼𝗱𝗲: a built-in component that handles tool use out of the box and helps AI teams move 100x faster. (i) It handles OpenAPI schemas, looping, and output parsing automatically. (ii) Supports multiple tools. (iii) And runs end-to-end from one component. This is by far the easiest component to set up in Vellum! And it still gives you plenty of flexibility. You can define tools in 2 ways: with raw Python or TypeScript using third party packages, or by reusing nested workflows that you’ve built with code/UI. Coming next is: more granular debugging, and a Tool Directory that lets you drop in common tools like Search, Extract, or SaaS integrations with a single click. 👋 to painful function calling setup once and for all 👋 Sign up to Vellum to try it today: https://lnkd.in/d44ejUGc Read how it works: https://lnkd.in/d-PHdx5t
49
4 Comments -
Yasser Elsaid
Chatbase • 36K followers
LLMs are intelligent enough to do a lot more than what they're doing now. But the application layer is just too slow to provide the right context, tools, UI, & guardrails to the models. There is room for 2-3x improvement in just optimizing the app layer. Great news for startups.
200
16 Comments -
Francesco Perticarari
Silicon Roundabout Ventures • 30K followers
Repeat with me! Most value in AI will NOT be in the application layer. But in the infrastructure layer. I haven't seen a decent convincing argument and no I don't care Sequoia disagrees. Over and over the true value capture happens at the infra and full stack. Microsoft won the OS game. Apple is full stack. NVIDIA is infra. TSMC... Tesla.... And before that: Intel, Cisco, etc. It's just the way it is. You have exceptions. But no e-commerce will be bigger than Shopify, and Shopify will never be bigger than AWS. Prove me wrong.
49
53 Comments -
Roy Nissim
LF AI & Data Foundation • 4K followers
Hugging Face demonstrated test-time scaling with Llama 1B, turning it into a "reasoner" that outperformed Llama 8B (8x larger!) in math. Test-time scaling has opened up a whole new dimension in model performance and AI research. I wrote about this when OpenAI released its o-1 model (the first 'reasoner' model). You can read it here: https://lnkd.in/gi737nsA At its core, a reasoner model uses Chain of Thought (CoT) reasoning—breaking problems into logical steps, solving them sequentially, and backtracking when needed. Scaling test-time compute enables this reasoning through two techniques: 1/ Self-Refinement: Models iteratively improve outputs, correcting errors over multiple passes. 2/ Search & Verifiers: Generate multiple outputs and evaluate them with verifier models (e.g., Process Reward Models for steps, Outcome Reward Models for final answers). Hugging Face’s contribution stands out because they’ve shown test-time scaling can be applied to any model, even smaller, open-source ones. They’ve also made it easier for others to replicate and advance the technique—helping the OS community catch up. That said, inference scaling is still far from efficient. Llama 1B required ~20 reasoning steps to outperform Llama 8B, a model only 8x larger. Nevertheless, inference scaling already achieves two critical things: – Pushing the frontier: Scaling inference pushes the boundaries of accuracy and delivers better results. – Smaller hardware requirement: Smaller models can match the performance of larger ones, reducing reliance on expensive infrastructure. The next frontier isn’t just about bigger models—it's about smarter inference. Excited to see where this leads!
62
4 Comments -
Michael R. Bock
Column Tax • 2K followers
I've spent thousands of dollars testing AI models on real-world tasks & inadvertently learned the nuances of the major model APIs. Here's an unordered list of things I've stumbled upon via TaxCalcBench (testing models' ability to calculate tax returns) about the OpenAI, Anthropic, and Gemini APIs: - Anthropic (Claude) defaults to no thinking and you have to explicitly tell it to spend reasoning tokens. This is a mistake IMO because Claude (Sonnet & Opus) don't perform well on complex tasks without reasoning, so by-default OpenAI/Gemini look better if you don't dig in. - Claude is super expensive compared to OpenAI & Gemini! - LiteLLM is a great little wrapper to get started and swap out model providers quickly with a one-line change, but eventually you will have to write provider-specific code. For example, each provider formats web search output differently. - OpenAI & Gemini return the web search queries they do, Claude only returns the citations. - OpenAI sets reasoning effort using levels (low/medium/high), Claude & Gemini allow you to set reasoning budgets with exact tokens. Each model has a different max reasoning budget (usually 32k or 64k tokens). - More thinking budget does not always lead to better performance. On TaxCalcBench for example, reasoning tokens does not improve Gemini 2.5 Pro's performance _at all_. But performance improved for 2.5 Flash, GPT-5, and the Claude models. Play around with reasoning budget for your specific task. - Gemini 2.5 Flash is a sneaky good (and fast! and cheap!) model. Consider starting with Flash before moving to Pro only if you actually need the extra performance. - Models w/ reasoning can be slow (up to ~5 minutes). Match reasoning effort to your need for latency/performance. - GPT-5 follows instructions really well. - Anthropic needs to work on their reliability 😭 (so does OpenAI). While these AI model APIs are magical intelligence in the sky, they're still HTTP APIs at the end of the day and have their quirks!
54
12 Comments -
Preetam Joshi
Aimon Labs • 3K followers
📣 Introducing RRE-1: A New Model for Retrieval Evaluation & Reranking for RAG based applications. Retrieval-Augmented Generation (RAG) has massively helped transform LLM-powered applications, but let’s face it—retrieval quality is still the weakest link in many pipelines. Poor retrieval is one of the major contributors of lower accuracy of your LLM applications. 🧐 That’s why we at AIMon built RRE-1, a powerful retrieval evaluation and reranker model designed to optimize ranking quality at scale. 🎯 👉 Why does this matter? Traditional retrieval methods like vector search are powerful, but they have limitations in effectiveness, especially when handling complex queries and domain specific knowledge. RRE-1 addresses this by offering: ✅ Offline evaluation of retrieval effectiveness, helping teams benchmark and refine their pipelines. ✅ Real-time low-latency reranking to ensure the most relevant context reaches the LLM. 👉 Who is this for? If you’re working on LLM agents, enterprise RAG systems, or AI copilots, this is for you. Our model ensures your retrieval pipeline is as optimized for your generative model—because great responses start with great retrieval. 🔎 Want to see it in action? Read more about AIMon RRE-1 in our blog post (link in comments). If you’re tackling retrieval challenges in your AI stack, we would love to chat! #AI #RAG #Retrieval #LLMs #MachineLearning
67
4 Comments -
Omid Gosha
Ewake • 2K followers
LLMs aren’t magic. They’re just another abstraction layer, one with incredible potential and equally serious trade-offs. At Ewake.ai, we’ve spent the past year building agentic systems powered by large language models. We’ve seen firsthand how LLMs shift the engineering paradigm, and how they force teams to rethink everything from latency management to architecture decisions. LLMs introduce real leverage, but also unpredictability, high costs, and slower response times. Prompting is programming: your API calls are no longer deterministic. You’re managing probabilities, not control flow. System design matters more than ever: fallbacks, guardrails, token budgets, and model switching all become part of your product logic. Good architecture is about understanding what to delegate to LLMs, and what not to. Read the full article here: https://lnkd.in/dAB-GGuw
35
2 Comments
Explore top content on LinkedIn
Find curated posts and insights for relevant topics all in one place.
View top content