Satyajit Kumar explains how NVIDIA Parakeet 1.1B runs at <200ms latency and up to 6m toks/sec decode speed with Cactus on mobile devices and Macs...yes 6m! Demo for your self by following the instruction here: https://lnkd.in/gWcQM3un
Cactus (YC S25)
Research Services
San Francisco, California 2,831 followers
Low-latency AI engine for mobile devices & wearables
About us
Low-latency AI engine for mobile devices & wearables.
- Website
-
https://www.cactuscompute.com
External link for Cactus (YC S25)
- Industry
- Research Services
- Company size
- 2-10 employees
- Headquarters
- San Francisco, California
- Type
- Privately Held
- Founded
- 2025
Locations
-
Primary
Get directions
San Francisco, California , US
-
Get directions
London, GB
Employees at Cactus (YC S25)
Updates
-
Noah Cylich walks us through using Liquid AI’s LFM2-24B-A2B model for coding locally on your Mac with OpenCode integration, powered by Cactus. Learn more: https://lnkd.in/gd2gX5db
-
Cactus (YC S25) reposted this
Today, we released our largest LFM2 model: LFM2-24B-A2B, a 24B Mixture-of-Experts model with 2.3B parameters active per token, built on our hybrid, hardware-aware LFM2 architecture. By activating only the most relevant parameters at runtime, LFM2-24B-A2B delivers large-model capability with fast, memory-efficient behavior in a 32GB, 2B-active footprint. From day zero, we’re making deploying LFM2-24B-A2B easy, with support from key partners: ☁️ Easily deploy in the cloud: > Together AI: serverless production deployment. Get started here: https://lnkd.in/eWeaGSDg > Modal: elastic GPU infrastructure with low-latency serving. Get started here: https://lnkd.in/ezDVbVid 💻 Easily deploy locally: > AMD: optimized for CPU, GPU, and NPU on Ryzen AI platform > Intel Corporation: supported via OpenVINO across AI PCs and data centers. Learn more here: https://lnkd.in/eFpUGxda > Qualcomm: optimized for AI PCs and high-end mobile 🧠 Access and run LFM2-24B-A2B across top platforms with: > Cactus (YC S25): Check out their guide to coding agents with LFM2-24B-A2B https://lnkd.in/eQdB7XAj > Ollama: Download LFM2-24B-A2B https://lnkd.in/euXqHeq6 > LM Studio: Download LFM2-24B-A2B https://lnkd.in/etU2rQtz > Nexa AI: See LFM2-24B-A2B running on Qualcomm Snapdragon® 8 Elite For Galaxy device (Samsung Galaxy S25 Ultra) powered by the Qualcomm Hexagon NPU: https://lnkd.in/euA-x4jp With LFM2s and our ecosystem partners, deploying fast, scalable, efficient AI in production is easier than ever. Get started today. Read more on our ecosystem of partners here: https://lnkd.in/emeBjdV9
-
-
All cloud fallback is 𝗙𝗥𝗘𝗘 𝘁𝗵𝗶𝘀 𝗙𝗲𝗯𝗿𝘂𝗮𝗿𝘆. Seriously. Make us regret this 😅 We just launched Hybrid Cloud inference and we're too excited for you to try it. 1. Go to www.cactuscompute.com 2. Sign up and create a key 3. Run unlimited on-device transcription and LLM inference with cloud fallback Cactus Hybrid Cloud runs inference on-device by default, as always. If the on-device model struggles, it automatically hands off inference to the cloud. Demo for yourself: 𝗯𝗿𝗲𝘄 𝗶𝗻𝘀𝘁𝗮𝗹𝗹 𝗰𝗮𝗰𝘁𝘂𝘀-𝗰𝗼𝗺𝗽𝘂𝘁𝗲/𝗰𝗮𝗰𝘁𝘂𝘀/𝗰𝗮𝗰𝘁𝘂𝘀 𝗰𝗮𝗰𝘁𝘂𝘀 𝘁𝗿𝗮𝗻𝘀𝗰𝗿𝗶𝗯𝗲
-
-
Cactus (YC S25) reposted this
Roman Shemet demos Cactus Hybrid Inference, observe the latency & toks/sec, use provided commands to reproduce! v1.7 cost our blood, sweat and tears; - No sense of work hours for cactus Jacks. - Piles of feedback from users, no one else is solving. - Mentally fatigued everyone with my yelling :( - Tight deadlines to submit 6 research papers. - Slack threads were reaching 300 replies, easily. - Community members reaching out to ask if we're ok. Shout out to the Cactus Pods; from UCLA, UMichigan, UPenn, UCI, Yale, UWaterloo, Imperial, CU Boulder etc., who all now officially co-maintain Cactus with Cactus (YC S25). They really picked up our slouch! Come build with us this Saturday at the Google DeepMind x Cactus Hackathon across multiple cities, 1200 registered teams already across San Francisco, UCL/London, MIT/Boston, UMaryland, NUS/Singapore and online.
-
Cactus (YC S25) reposted this
We are excited to partner up with Cactus (YC S25) for our AI MakerSpace. This is a great opportunity to, Work with the Cactus v1 SDK and API to build production-grade AI applications. Access high-performance compute resources used by top-tier researchers. #Partnership #AIMakerSpace
The AI Society is proud to partner with Cactus Compute, a YC-backed AI infrastructure company, to bring students hands-on experience building real-world AI systems. Over 9 weeks, participants will: • Build production-level AI projects • Access modern AI infrastructure & SDKs • Compete on weekly leaderboards • Earn shareable achievement badges • Learn from experienced mentors This is more than a program — it’s a launchpad.
-
-
Cactus (YC S25) reposted this
Cactus (YC S25) is synergising with Google DeepMind, AI Tinkerers and AI Nexus Community to bring you a global multi-city hackathon: 👇🏾 I’d love to say that DeepMind agreed to this because I’ve got an overwhelming charisma, but the reality; having collaborated with 8 teams at Google, met a ton of the executives and researchers, the people are nicer than they think. Being an annoyingly persistent person, this is only the first of more announcements to come. I bugged them so much that my email is probably blocked on their servers. But I did not get here in life by folding my arms while opportunisties fly by. Shout out to my new best friends: - Amit Vadi (🐐x ♾️) - Jake Laes (🐐) - James Unsworth (🐄) - Hindy Rossignol & his MIT Org - UCLA’s Bruin AI - All involved host orgs at UCL, NUS, Maryland etc. Learn More: https://luma.com/f0arqlwy
-
-
Cactus (YC S25) reposted this
Cactus (YC S25) v1.6 simplifies everything: 👇 1. Auto-RAG: when initializing Cactus, you can pass a .txt, .md or directory with multiple, which will be automatically chunked and indexed using our advanced memory-efficient Cactus Indexing algorithm, and Cactus Rank algorithm. 2. Cloud Fallback: we designed confidence algorithms which the model uses to introspect while generating, if making an error, it can decide in a few milliseconds to return "cloud_fallback = true" in which case you should route to a frontier model. 3. Real-time transcription: Cactus now has APIs for running transcription models, with as low as 200ms latency on Whisper Small and 60ms on Moonshine. 4. Comprehensive Response JSON: Each prompt returns function calls (if any), as well as benchmarks, RAM usage, etc. 5. Support for C/C++, Rust, Python, React, Flutter, Kotlin and Swift. Learn more: https://lnkd.in/dgct-Rb8
-
-
Cactus (YC S25) reposted this
Cactus (YC S25) v1.6 has been released, Cactus is now much more performant on cheaper devices. FunctionGemma & LFM2-350m are highly capable models for real-world agentic tasks on resource-constrained devices. Follow the repo: https://lnkd.in/e-XJqyT7
-