Anyscale

Software Development

San Francisco, California 55,166 followers

Scalable compute for AI and Python

See jobs Follow

Discover all 409 employees

About us

Scalable compute for AI and Python Anyscale enables developers of all skill levels to easily build applications that run at any scale, from a laptop to a data center.

Website: https://anyscale.com
External link for Anyscale
Industry: Software Development
Company size: 201-500 employees
Headquarters: San Francisco, California
Type: Privately Held
Founded: 2019

Products

Anyscale

AIOps Platforms

The Anyscale Platform offers key advantages over Ray open source. It provides a seamless user experience for developers and AI teams to speed development, and deploy AI/ML workloads at scale. Companies using Anyscale benefit from rapid time-to-market and faster iterations across the entire AI lifecycle.

Locations

Primary

600 Harrison St

San Francisco, California 94107, US

Get directions
411 High St

Palo Alto, California 94301, US

Get directions

Employees at Anyscale

See all employees

Updates

Anyscale

55,166 followers
1h
Report this post
Join Anyscale at #NeurIPS2025 in San Diego. We'll be gathering a group of researchers, founders, & engineers over food and drinks. We'll be discussing Ray and the frontier of large-scale RL, multimodal model training, and multi-node LLM inference. 📅 Thursday, December 4 · 5-8 PM 📍 Downtown San Diego https://luma.com/4nzhkr1p

Anyscale Happy Hour at NeurIPS · Luma luma.com

1 Comment

Like Comment Share
Anyscale reposted this
AI21 Labs

33,748 followers
3d Edited
Report this post
What an evening at AI21 Labs! 🎉 We were proud to host the first Ray meetup in Tel Aviv together with our partners at Anyscale, putting efficient LLM inference at scale in the spotlight. Huge thanks to our speakers: Linda Haviv, Carl Winkler, Ido Ben David and our own Asaf Joseph Gardin. Great talks, sharp questions, and an amazing community turnout. We’re excited to keep growing the Ray and LLM inference ecosystem in TLV together.
Like Comment Share
Anyscale reposted this
Richard Liaw

Anyscale
4d Edited
Report this post
Ray and vLLM have worked closely together to improve the large model interactive development experience! Spinning up multi-node vLLM with Ray on interactive environments can be tedious, requiring users to juggle separate commands for different nodes, breaking the “single symmetric entrypoint” mental model that many users expect. Ray now has a new command: ray symmetric-run. This command makes it possible to launch the same entrypoint command on every node in a Ray cluster. This makes it really easy to spawn vLLM servers with multi-node models on HPC setups or in using parallel ssh tools like mpssh. Check out the blog: https://lnkd.in/gniPWzge Thanks for Kaichao You for the collaboration!
2 Comments

Like Comment Share
Anyscale reposted this
Seiji Eicher

Distributed LLM Inference @ Anyscale
4d
Report this post
Wide-EP and prefill/decode disaggregation APIs for vLLM are now available in Ray 2.52 🚀🚀 Validated at 2.4k tokens/H200 on Anyscale Runtime, these patterns maximize sparse MoE model (DeepSeek, Kimi, Qwen3) inference efficiency, but often require non-trivial orchestration logic. Here’s how they work…🧵 Engine replicas can no longer be scaled independently. Efficient serving now requires coordinating: - data-parallel ranks - topology-specific expert parallel ranks - KV-cache transfer across deployments - heterogeneous resource profiles (prefill vs decode) For high throughput workloads, batch size can increase by more than 2x compared to tensor parallelism. Wide-EP distributes experts across GPUs + adds load balancing, expert replication, and optimized all2alls. The complexity: replicas must form shared DP/EP groups, agree on IP/port #s, and scale ingress separately from engines. Ray Serve LLM now exposes a builder API that integrates with vLLM and handles all of this automatically. Prefill/decode disaggregation separates input token processing and token generation into independent deployments with different scaling behaviors. Prefill is compute-intensive, while decode is memory bandwidth-bound. When the same replica handles both in the same batch, prefill delays accumulate and throughput tanks. Ray Serve LLM now has a build_pd_openai_app builder that: - Creates prefill + decode deployments - Sets up the NIXL KV transfer connector - Routes requests through a PDProxyServer - (Optionally) uses prefix cache-aware routing for prefill For the full writeup, see link in comments 🙂
6 Comments

Like Comment Share
Anyscale

55,166 followers
5d
Report this post
Today's the day! We're hosting the Ray x AI21 Labs Meetup: Efficient LLM Inference at Scale with vLLM tonight in Tel Aviv, and there are still a few spots left if you want to join us. 🗓️ Today, Wednesday Nov 26 🕕 6:00–8:00 PM (GMT+2) 📍 AI21 Labs, Tel Aviv We'll dive into how teams use Ray and vLLM to run LLM inference faster, cheaper, and at scale in real-world production environments, with speakers from Anyscale and AI21 Labs. 👉 Grab your spot: https://luma.com/6skfv9ob
Like Comment Share
Anyscale reposted this
PyTorch

299,862 followers
5d
Report this post
At NeurIPS next week? Join us at our Open Source AI Reception, an evening focused on open source collaboration hosted by Cloud Native Computing Foundation (CNCF) and PyTorch with Anyscale, Featherless AI, Hugging Face, and Unsloth AI. Join AI enthusiasts, developers, and researchers for an evening of networking and conversation outside NeurIPS 2025. Drinks and light bites provided. 🔗 Register to secure your spot: https://hubs.la/Q03VVjP50 Wednesday, December 3, 6:00–9:00 PM PT Union Kitchen and Tap Gaslamp, San Diego, California, USA #PyTorch #NeurIPS2025 #NeurIPS #OpenSourceAI
4 Comments

Like Comment Share
Anyscale reposted this
PyTorch

299,862 followers
5d
Report this post
At NeurIPS next week? Join us at our Open Source AI Reception, an evening focused on open source collaboration hosted by Cloud Native Computing Foundation (CNCF) and PyTorch with Anyscale, Featherless AI, Hugging Face, and Unsloth AI. Join AI enthusiasts, developers, and researchers for an evening of networking and conversation outside NeurIPS 2025. Drinks and light bites provided. 🔗 Register to secure your spot: https://hubs.la/Q03VVjP50 Wednesday, December 3, 6:00–9:00 PM PT Union Kitchen and Tap Gaslamp, San Diego, California, USA #PyTorch #NeurIPS2025 #NeurIPS #OpenSourceAI
4 Comments

Like Comment Share
Anyscale

55,166 followers
5d
Report this post
Heading to AWS re:Invent? Join us at booth #1854, book a 1:1 meeting or learn more across breakouts, lightning talks and executive round tables. https://lnkd.in/gmVc9wB6

AI with Ray and Anyscale at AWS re:Invent 2025 anyscale.com

Like Comment Share
Anyscale

55,166 followers
5d
Report this post
The rise of LLMs has shifted attention toward online inference… …but most real applications still hinge on something less glamorous: data preparation. With multimodal requirements becoming the norm, that prep work is getting more complex, not less. And to build cost-efficient agentic systems, the industry is realizing we need a mix of large and small models — each playing to its strengths. Large models coordinate tools and workflows; smaller models execute tasks quickly and efficiently. Multimodal data processing, LLM fine-tuning / post-training, and model chaining for agentic inference all become far more scalable with Ray as a common compute fabric for AI. 🎥 In this demo, Akshay (the Ray expert) shows Goku (the Ray noob) how any developer can go from idea to scalable multimodal application in days using the Anyscale managed platform that gives developers advanced developer tooling and managed Ray clusters to go from idea to scalable application in a matter of days.

1 Comment

Like Comment Share
Anyscale

55,166 followers
6d
Report this post
Ever wondered how Ray actually scales AI in the real world? Join our live, virtual hands-on lab and code alongside our lead instructor. You’ll get step-by-step guidance and real-time answers to your questions. In this session, you’ll learn how to use Ray to: - Build and scale data pipelines with Ray - Run GPU batch inference efficiently - Integrate LLM inference into your workflows - Seats are limited to keep it interactive — save your spot today! Don’t just see Ray – build with it. Register now: https://lnkd.in/gZwkmpQx

Live Virtual Hands On Lab: Building Scalable AI with Ray anyscale.com

Like Comment Share

Browse jobs

Funding

Anyscale 4 total rounds

Last Round

Series C Sep 23, 2022

US$ 99.0M

Investors

Intel Capital Addition + 2 Other investors

See more info on crunchbase

Anyscale

Software Development

San Francisco, California 55,166 followers

Scalable compute for AI and Python

About us

Products

Anyscale

AIOps Platforms

Locations

Employees at Anyscale

Patrick Lonergan

Head of Legal at Anyscale

Lou Serlenga

Head of Alliances / GTM Advisor / CRO

Angelina Le Grix

Technical Documentation Manager

Arun Singhal

Updates

Join now to see what you are missing

Similar pages

Databricks

Perplexity

Scale AI

Anthropic

Together AI

Glean

OpenAI

Kapstan

Conviva

xAI

Browse jobs

Engineer jobs

Software Engineer jobs

Account Executive jobs

Senior Software Engineer jobs

Enterprise Account Executive jobs

Machine Learning Engineer jobs

Scientist jobs

Vice President jobs

Analyst jobs

Developer jobs

Director jobs

Manager jobs

Software Engineering Manager jobs

Senior Product Manager jobs

Staff Software Engineer jobs

Associate jobs

Engineering Manager jobs

Intelligence Specialist jobs

Product Manager jobs

Director of Analytics jobs

Funding