We took the Hot Wings Challenge to NVIDIA GTC 🌶️ Our VP of Kernels Dan Fu and VP of Customer Success Sarung Tripathi sat down for a conversation about AI...with a spicy catch. For every question, another wing. And the heat kept climbing. Big insights. Bigger sweat.
Together AI
Software Development
San Francisco, California 86,760 followers
Accelerate inference, model shaping, and pre-training on a research-optimized platform.
About us
Together AI is the AI Native Cloud, purpose-built for AI engineers and researchers with a full suite of tooling across inference, model shaping, and pre-training. AI natives can use Together AI as a full-stack AI platform — from a high- performance inference engine built for reliable and fast scaling to on-demand GPU clusters and massive-scale AI factories. Together AI continuously pushes the frontier forward by productizing cutting-edge research from our world-leading AI systems research team. By combining research velocity with production-grade infrastructure, we enable companies to reliably scale AI-native applications as fast as the field evolves. Trusted by leading AI natives like Cursor, Decagon, Eleven Labs, AI21, Hedra, and Cartesia, as well as SaaS innovators such as Salesforce, Zoom, and Zomato, Together AI powers the next generation of AI-native applications.
- Website
-
https://together.ai
External link for Together AI
- Industry
- Software Development
- Company size
- 201-500 employees
- Headquarters
- San Francisco, California
- Type
- Privately Held
- Founded
- 2022
- Specialties
- Artificial Intelligence, Cloud Computing, LLM, Open Source, and Decentralized Computing
Locations
-
Primary
Get directions
251 Rhode Island St
Suite 205
San Francisco, California 94103, US
Employees at Together AI
Updates
-
Together AI serves the two fastest speech-to-text models measured by Artificial Analysis. NVIDIA Parakeet-TDT 0.6B v3 on Together AI can transcribe ~20 hours of speech in under 10 seconds. This deep dive from Sebastien Beurnier shows the systems work behind the leaderboard: TensorRT profiles, conditional CUDA graphs, evented I/O, shared memory, and Python GC control.
-
The best research labs are building what comes after static models. Congrats to Trajectory on the launch! Excited to have them training on the AI Native Cloud as they push the frontier on Continual Learning!
Today, Michael Elabd, Arjun Karanam, and I are excited to announce our $15M raise for Trajectory. We are a research lab and product company building the platform for Continual Learning. Our platform unlocks the signal already sitting in product usage, so companies can continuously train large-scale agentic models that outperform the frontier. We’ve raised $15M from Conviction, Bessemer Venture Partners, Radical Ventures, Jeff Dean, Fei-Fei Li and more. We’re partnering with some of the best AI-native companies: Clay, Harvey, Decagon, Mercor, and Rogo to power their agentic systems, some of which we are already in production with. We’ve brought together a world class research team from DeepMind, OpenAI, Apple, Meta Superintelligence, Amazon AGI, Scale AI, and an elite product team from Stripe and Figma. AI will never again start on day one. Every correction, every retry, every edit will make products smarter. This is Continual Learning.
-
Together AI reposted this
🚨 New ICML 2026 Paper! Learning To Discover at Test Time. For a few hundred dollars of compute, we made an open-source 120B model climb four hard scientific leaderboards. 1. It wrote a GPU kernel 2× faster than the best human-written kernel on the GPUMode leaderboard. 2. It beat DeepMind's AlphaEvolve on a 70-year-old open Erdős problem. An improvement 16× larger than AlphaEvolve's. 3. It placed 1st-equivalent in two AtCoder Heuristic Contests against hundreds of expert humans, ahead of the previous best AI. 4. It set new state-of-the-art on single-cell RNA-seq denoising. Same algorithm. Same open-source model (gpt-oss-120b). The method, TTT-Discover, is reinforcement learning, at inference, on a single problem. Accepted at ICML 2026. Paper, code, every result tied to its public verifier: 🔗 https://lnkd.in/dGSmjHyn With Mert Yuksekgonul, Daniel Koceja, Xinhao Li, Jed McCaleb, Xiaolong Wang, Jan Kautz, Yejin Choi, James Zou, Carlos Guestrin and Yu Sun. Stanford · NVIDIA · Astera Institute · UC San Diego · Together AI.
-
We’re excited to announce Qwen3.7-Max on Together AI 🚀 AI natives can now deploy Qwen’s flagship model for the agent era on Together Serverless Inference and benefit from reliable infrastructure for long-horizon coding, reasoning, and autonomous workflows. Highlights: → Long-horizon autonomy: maintained coherent execution across a 35-hour autonomous kernel optimization run → Agentic coding: leading Terminal-Bench 2.0-Terminus performance for terminal-based engineering workflows → General agent workflows: strong tool orchestration, office automation, and spreadsheet reasoning → 1M context: built for longer tasks, larger working sets, and persistent agent workflows Try it now: https://lnkd.in/gFRA-pbs
-
-
We’re excited to announce MiniMax Speech 2.8 Turbo on Together AI. AI natives can now deploy MiniMax’s enterprise TTS model on Together AI dedicated infrastructure for expressive, real-time voice agents. With MiniMax Speech 2.8 Turbo, teams get: → Sound Tags for laughter, breathing, sighs, gasps, and other vocal cues → 60% prosody improvement over Speech 2.6 → Sub-250ms end-to-end latency with streaming support → 40+ language support for global voice applications Try it in voice finder: https://lnkd.in/g3vZdrj3
-
"One thing that we've been seeing recently is that inference benchmarks don't really match production workloads that well." - Dan Fu, VP of Kernels When you're running dozens of concurrent coding agents — each with 45k–200k token contexts — the benchmarks that matter are the ones that stress KV cache, scheduler limits, and throughput under real load. We ran those benchmarks. Our Inference Engine delivered: → 31% higher throughput than the next fastest open source engine → 2× better time-to-first-token at saturation → 76% lower cost per request compared to Claude Opus 4.6 Read the full technical breakdown → https://lnkd.in/gEQxp8Sp
-
Congrats to the Cursor team on Composer 2.5 — a huge milestone for agentic coding models. Together AI, the AI Native Cloud, is proud to partner on this launch. Composer 2.5 is pushing the frontier for coding agents and turning heads for its speed and quality. Excited to keep building with the Cursor team!
Introducing Composer 2.5, our most powerful model yet. It's more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions. For the next week, we’re doubling the included usage of the model. Learn more about Composer 2.5: https://lnkd.in/esfiRv7F
-