Baseten

Software Development

San Francisco, CA 28,346 followers

Own your inference.

See jobs Follow

Discover all 273 employees

About us

Inference is everything. Baseten is an AI infrastructure platform giving you the tooling, expertise, and hardware needed to bring great AI products to market - fast. Our proprietary Inference Stack utilizes the cutting-edge of performance research combined with highly performant and reliable infrastructure to give you out-of-the-box global availability with 99.99% of uptime.

Website: https://www.baseten.co/
External link for Baseten
Industry: Software Development
Company size: 201-500 employees
Headquarters: San Francisco, CA
Type: Privately Held
Specialties: developer tools, software engineering, artificial intelligence, and machine learning

Products

Baseten

Machine Learning Software

At Baseten we provide all the infrastructure you need to deploy and serve ML models performantly, scalably, and cost-efficiently. Get started in minutes, and avoid getting tangled in complex deployment processes. You can deploy best-in-class open-source models and take advantage of optimized serving for your own models. We also utilize horizontally scalable services that take you from prototype to production, with light-speed inference on infra that autoscales with your traffic. Best in class doesn't mean breaking the bank. Run your models on the best infrastructure without running up costs by taking advantage of our scaled-to-zero feature.

Locations

Primary

San Francisco, CA, US

Get directions
New York, NY, US

Get directions

Employees at Baseten

See all employees

Updates

Baseten reposted this
Fred Liu
5h Edited
Report this post
NVIDIA's Cosmos 3 is out today. The idea: Generate worlds so robots can learn by practicing in them. Training data for physical AI, generated on demand. Read more in this not-so-simple example of teaching a robot to open a door. https://lnkd.in/gE5CwzPZ

Nvidia Cosmos 3: Robots finally take over baseten.co

Like Comment Share
Baseten reposted this
Redpoint

62,264 followers
4d
Report this post
The Redpoint InfraRed 100 is now live. These are the companies building the infrastructure that powers everything happening in AI right now, from world models and agent runtimes to the sandboxes, databases, and security tools agents depend on. Congratulations to this year's honorees! Read the full 2026 InfraRed Report: our state of the union on AI and cloud infrastructure 👉 https://lnkd.in/eEevP-Wd
91 Comments

Like Comment Share
Baseten reposted this
Yikai Zhu
2d
Report this post
Our kernels team shipped 2.5x faster FLUX.2 image gen without visible quality loss. How we did it: Distribution Matching Distillation 2 (DMD2) to distill FLUX.2 from 20 denoising steps to 8. Details here.

2.5x faster image generation with timestep distillation on FLUX.2 Yikai Zhu on LinkedIn

5 Comments

Like Comment Share
Baseten reposted this
Winston Weinberg
4d Edited
Report this post
Today we're sharing our first research collaboration with Baseten on open-weight legal agents - and the results point to where vertical AI is heading. Using signal from LAB (our Legal Agent Benchmark of 1,200+ tasks across 24 practice areas), we post-trained a 27B open-weight model and brought it into the closed-source frontier band. Three main takeaways: 1. Open weights unlock cost, governance, and a path to deeper capability. Reaching the top of LAB with frontier models runs ~$50 and 20+ minutes per task. Open-weight agents can live inside a firm's own secure cloud, expose their reasoning traces for audit, and - with the right post-training pipeline - close the gap on a benchmark where even frontier models complete fewer than 10% of tasks end-to-end. 2. The model and the system around it have to be built together. We designed a "compaction" system that lets agents summarize what they've read so they can keep working on long tasks without losing context. It gave frontier models a 2.6 - 3.7x boost - but did nothing for the open-weight model until we actually trained the model to use it. You need both the model and the system. 3. Smaller models can learn to work like the best ones when you train them on the right examples. With a small amount of training against LAB's expert rubrics, a 9B model stopped relying on keyword search and started reading documents in full - the same approach the top frontier models (Opus, Sonnet, GPT-5.5) use on their own. The quality of your evaluation data shapes how the model behaves. Full write-up in the comments.
19 Comments

Like Comment Share
Baseten

28,346 followers
4d Edited
Report this post
Most enterprise AI models stay static after deployment, forcing teams to rely on endless prompt tweaking. This is the problem Trajectory AI is solving. Their continual learning platform enables companies to deploy custom models that improve continuously from production usage. To power their launch, Trajectory partnered with us. We delivered FP8 and NVFP4 quantization, automated deployment pipelines, and autoscaled H100 infrastructure. This enabled Trajectory to conduct live A/B tests with customers starting in April 2026 with zero outages and a 3x capacity expansion in under three weeks. The results speak for themselves. “Baseten is a unique infrastructure partner. When we needed a 397-billion-parameter model quantized and deployed overnight before a customer launch, they just did it.” — Michael Elabd, Trajectory founder and CTO Read our launch blog post here: https://lnkd.in/gm8cfY4e
8 Comments

Like Comment Share
Baseten reposted this
Jack O'Brien
1w
Report this post
Today we're releasing TIM-Qwen3.6-27B on a new OpenAI and Anthropic compatible API. Last month I wrote that open models had finally caught up to frontier models on the work most people *actually* need AI to do. The bottleneck stopped being the model and started being the environment around it. This release is our next step to unlock open models with our co-designed runtime and post-training process, now delivered in an API format that developers already love. The newest iteration of our inference runtime, TIMRUN, compresses context on the fly without losing reasoning quality. On long-context agent workloads, that means 10x effective context window length, 3x concurrent throughput, and 49% lower latency compared to models using SGLang on the same GPU. If you have a project that uses the OpenAI or Claude SDKs, you can point it at our endpoint and try TIM-Qwen3.6-27B in a few minutes. Full post on this release linked below. (We're also excited to share this system with 250+ developers at our hackathon next week with Baseten, Cloudflare, and Wayfair as part of Boston TECH WEEK by a16z)
16 Comments

Like Comment Share
Baseten reposted this
Bola Malek
1w
Report this post
As we've witnessed how AI transformed the software industry over the last year, I'm convinced that every industry will be transformed by these tools. Science is going through this transformation right now! I'm excited to share my discussion with Mihir Trivedi about how Baseten is accelerating inference and AI adoption in Life Sciences through our partnership with Benchling. It's been awesome building with Mihir and the team to bring this to life! Read more about it on our blog. https://lnkd.in/gtYF4hJY

Like Comment Share
Baseten reposted this
Benchling

64,116 followers
1w
Report this post
Today we’re announcing Benchling Inference! Together with Baseten, we’re offering scalable, cost-effective inference built for scientific AI. Why? Scientific workloads don’t look like typical AI workloads. Demands come in bursts, with teams needing to run 100,000 predictions for a few hours before going quiet again for days. Most infrastructure wasn’t built for that kind of scale or flexibility. With Benchling Inference, powered by Baseten, R&D teams can: ✔️ Run scientific models without managing infrastructure ✔️ Scale workloads up or down in seconds ✔️ Access cost-effective compute, enabled by aggregating demand We’ve taken everything we’ve learned from running Baseten for Benchling’s Model Hub — the configurations, defaults, and integrations to make inference work out-of-the-box for biotech — so you don’t have to. 👉 Learn more about our end-to-end solution for scalable inference: https://lnkd.in/duuR9Fxx
2 Comments

Like Comment Share
Baseten

28,346 followers
1w
Report this post
Biotech R&D is generating more scientific AI models than ever, from protein structure prediction to molecular docking to sequence analysis. But the infrastructure to run them hasn't kept up. Today we're announcing Benchling Inference, powered by Baseten. Together with Benchling we're delivering on-demand GPU capacity built for the bursty, high-stakes demands of scientific workloads. With Benchling Inference, scientists can: → Deploy models in seconds, not weeks → Keep proprietary models inside their VPC if needed → Benefit from economics that work even at small and mid-size biotech scale Benchling and Baseten decided to team up because we believe that research teams shouldn't have to manage HPC queues, negotiate cloud contracts, or become GPU experts to run frontier models on their own data. Six years of inference expertise are now available where science happens. Read more here: https://lnkd.in/gj2hpC78
7 Comments

Like Comment Share
Baseten

28,346 followers
1w
Report this post
“Intensity plus joy — an aha moment for me at Baseten is that those two things are not on opposite sides of a spectrum. Those can be inextricably linked, and that is the best.” Our President Dannie Herzberg sat down with Conviction to chat GTM, culture, and hiring in an evolving AI market.

1 Comment

Like Comment Share

Browse jobs

Funding

Baseten 6 total rounds

Last Round

Series D Oct 5, 2025

US$ 150.0M

Investors

Bond + 8 Other investors

See more info on crunchbase

Baseten

Software Development

San Francisco, CA 28,346 followers

Own your inference.

About us

Products

Baseten

Machine Learning Software

Locations

Employees at Baseten

Jason Dupree

Marylise Tauzia

Dharmesh Thakker

Tarun Diwan

Updates

Join now to see what you are missing

Similar pages

Decagon

Fireworks AI

ElevenLabs

Harvey

Together AI

Arize AI

Metronome

Sierra

Parsed

Anthropic

Browse jobs

Engineer jobs

Machine Learning Engineer jobs

Scientist jobs

Software Engineer jobs

Developer jobs

Marketing Manager jobs

Manager jobs

Senior Software Engineer jobs

Intern jobs

Associate jobs

Analyst jobs

Human Resources Specialist jobs

Executive jobs

Full Stack Engineer jobs

Operational Specialist jobs

Junior Software Engineer jobs

Designer jobs

Human Resources Generalist jobs

Human Resources Manager jobs

Account Executive jobs

Funding