Baseten’s cover photo
Baseten

Baseten

Software Development

San Francisco, CA 28,346 followers

Own your inference.

About us

Inference is everything. Baseten is an AI infrastructure platform giving you the tooling, expertise, and hardware needed to bring great AI products to market - fast. Our proprietary Inference Stack utilizes the cutting-edge of performance research combined with highly performant and reliable infrastructure to give you out-of-the-box global availability with 99.99% of uptime.

Website
https://www.baseten.co/
Industry
Software Development
Company size
201-500 employees
Headquarters
San Francisco, CA
Type
Privately Held
Specialties
developer tools, software engineering, artificial intelligence, and machine learning

Products

Locations

Employees at Baseten

Updates

  • Baseten reposted this

    The Redpoint InfraRed 100 is now live. These are the companies building the infrastructure that powers everything happening in AI right now, from world models and agent runtimes to the sandboxes, databases, and security tools agents depend on. Congratulations to this year's honorees! Read the full 2026 InfraRed Report: our state of the union on AI and cloud infrastructure 👉 https://lnkd.in/eEevP-Wd

    • No alternative text description for this image
  • Baseten reposted this

    Today we're sharing our first research collaboration with Baseten on open-weight legal agents - and the results point to where vertical AI is heading. Using signal from LAB (our Legal Agent Benchmark of 1,200+ tasks across 24 practice areas), we post-trained a 27B open-weight model and brought it into the closed-source frontier band.  Three main takeaways: 1. Open weights unlock cost, governance, and a path to deeper capability.  Reaching the top of LAB with frontier models runs ~$50 and 20+ minutes per task.  Open-weight agents can live inside a firm's own secure cloud, expose their reasoning traces for audit, and - with the right post-training pipeline - close the gap on a benchmark where even frontier models complete fewer than 10% of tasks end-to-end. 2. The model and the system around it have to be built together.  We designed a "compaction" system that lets agents summarize what they've read so they can keep working on long tasks without losing context. It gave frontier models a 2.6 - 3.7x boost - but did nothing for the open-weight model until we actually trained the model to use it. You need both the model and the system. 3. Smaller models can learn to work like the best ones when you train them on the right examples. With a small amount of training against LAB's expert rubrics, a 9B model stopped relying on keyword search and started reading documents in full -  the same approach the top frontier models (Opus, Sonnet, GPT-5.5) use on their own.  The quality of your evaluation data shapes how the model behaves. Full write-up in the comments.

    • No alternative text description for this image
  • View organization page for Baseten

    28,346 followers

    Most enterprise AI models stay static after deployment, forcing teams to rely on endless prompt tweaking. This is the problem Trajectory AI is solving. Their continual learning platform enables companies to deploy custom models that improve continuously from production usage. To power their launch, Trajectory partnered with us. We delivered FP8 and NVFP4 quantization, automated deployment pipelines, and autoscaled H100 infrastructure. This enabled Trajectory to conduct live A/B tests with customers starting in April 2026 with zero outages and a 3x capacity expansion in under three weeks. The results speak for themselves. “Baseten is a unique infrastructure partner. When we needed a 397-billion-parameter model quantized and deployed overnight before a customer launch, they just did it.” — Michael Elabd, Trajectory founder and CTO Read our launch blog post here: https://lnkd.in/gm8cfY4e

    • No alternative text description for this image
  • Baseten reposted this

    Today we're releasing TIM-Qwen3.6-27B on a new OpenAI and Anthropic compatible API. Last month I wrote that open models had finally caught up to frontier models on the work most people *actually* need AI to do. The bottleneck stopped being the model and started being the environment around it. This release is our next step to unlock open models with our co-designed runtime and post-training process, now delivered in an API format that developers already love. The newest iteration of our inference runtime, TIMRUN, compresses context on the fly without losing reasoning quality. On long-context agent workloads, that means 10x effective context window length, 3x concurrent throughput, and 49% lower latency compared to models using SGLang on the same GPU. If you have a project that uses the OpenAI or Claude SDKs, you can point it at our endpoint and try TIM-Qwen3.6-27B in a few minutes. Full post on this release linked below. (We're also excited to share this system with 250+ developers at our hackathon next week with Baseten, Cloudflare, and Wayfair as part of Boston TECH WEEK by a16z)

    • No alternative text description for this image
  • Baseten reposted this

    As we've witnessed how AI transformed the software industry over the last year, I'm convinced that every industry will be transformed by these tools. Science is going through this transformation right now! I'm excited to share my discussion with Mihir Trivedi about how Baseten is accelerating inference and AI adoption in Life Sciences through our partnership with Benchling. It's been awesome building with Mihir and the team to bring this to life! Read more about it on our blog. https://lnkd.in/gtYF4hJY

  • Baseten reposted this

    Today we’re announcing Benchling Inference! Together with Baseten, we’re offering scalable, cost-effective inference built for scientific AI. Why? Scientific workloads don’t look like typical AI workloads. Demands come in bursts, with teams needing to run 100,000 predictions for a few hours before going quiet again for days. Most infrastructure wasn’t built for that kind of scale or flexibility. With Benchling Inference, powered by Baseten, R&D teams can: ✔️ Run scientific models without managing infrastructure ✔️ Scale workloads up or down in seconds ✔️ Access cost-effective compute, enabled by aggregating demand We’ve taken everything we’ve learned from running Baseten for Benchling’s Model Hub — the configurations, defaults, and integrations to make inference work out-of-the-box for biotech — so you don’t have to. 👉 Learn more about our end-to-end solution for scalable inference: https://lnkd.in/duuR9Fxx

    • No alternative text description for this image
  • Biotech R&D is generating more scientific AI models than ever, from protein structure prediction to molecular docking to sequence analysis. But the infrastructure to run them hasn't kept up. Today we're announcing Benchling Inference, powered by Baseten. Together with Benchling we're delivering on-demand GPU capacity built for the bursty, high-stakes demands of scientific workloads. With Benchling Inference, scientists can: → Deploy models in seconds, not weeks → Keep proprietary models inside their VPC if needed → Benefit from economics that work even at small and mid-size biotech scale Benchling and Baseten decided to team up because we believe that research teams shouldn't have to manage HPC queues, negotiate cloud contracts, or become GPU experts to run frontier models on their own data. Six years of inference expertise are now available where science happens. Read more here: https://lnkd.in/gj2hpC78

    • No alternative text description for this image

Similar pages

Browse jobs

Funding

Baseten 6 total rounds

Last Round

Series D

US$ 150.0M

See more info on crunchbase