Join Anyscale at #NeurIPS2025 in San Diego. We'll be gathering a group of researchers, founders, & engineers over food and drinks. We'll be discussing Ray and the frontier of large-scale RL, multimodal model training, and multi-node LLM inference. 📅 Thursday, December 4 · 5-8 PM 📍 Downtown San Diego https://luma.com/4nzhkr1p
Anyscale
Software Development
San Francisco, California 55,166 followers
Scalable compute for AI and Python
About us
Scalable compute for AI and Python Anyscale enables developers of all skill levels to easily build applications that run at any scale, from a laptop to a data center.
- Website
-
https://anyscale.com
External link for Anyscale
- Industry
- Software Development
- Company size
- 201-500 employees
- Headquarters
- San Francisco, California
- Type
- Privately Held
- Founded
- 2019
Products
Anyscale
AIOps Platforms
The Anyscale Platform offers key advantages over Ray open source. It provides a seamless user experience for developers and AI teams to speed development, and deploy AI/ML workloads at scale. Companies using Anyscale benefit from rapid time-to-market and faster iterations across the entire AI lifecycle.
Locations
-
Primary
Get directions
600 Harrison St
San Francisco, California 94107, US
-
Get directions
411 High St
Palo Alto, California 94301, US
Employees at Anyscale
Updates
-
Anyscale reposted this
What an evening at AI21 Labs! 🎉 We were proud to host the first Ray meetup in Tel Aviv together with our partners at Anyscale, putting efficient LLM inference at scale in the spotlight. Huge thanks to our speakers: Linda Haviv, Carl Winkler, Ido Ben David and our own Asaf Joseph Gardin. Great talks, sharp questions, and an amazing community turnout. We’re excited to keep growing the Ray and LLM inference ecosystem in TLV together.
-
-
Anyscale reposted this
Ray and vLLM have worked closely together to improve the large model interactive development experience! Spinning up multi-node vLLM with Ray on interactive environments can be tedious, requiring users to juggle separate commands for different nodes, breaking the “single symmetric entrypoint” mental model that many users expect. Ray now has a new command: ray symmetric-run. This command makes it possible to launch the same entrypoint command on every node in a Ray cluster. This makes it really easy to spawn vLLM servers with multi-node models on HPC setups or in using parallel ssh tools like mpssh. Check out the blog: https://lnkd.in/gniPWzge Thanks for Kaichao You for the collaboration!
-
-
Anyscale reposted this
Wide-EP and prefill/decode disaggregation APIs for vLLM are now available in Ray 2.52 🚀🚀 Validated at 2.4k tokens/H200 on Anyscale Runtime, these patterns maximize sparse MoE model (DeepSeek, Kimi, Qwen3) inference efficiency, but often require non-trivial orchestration logic. Here’s how they work…🧵 Engine replicas can no longer be scaled independently. Efficient serving now requires coordinating: - data-parallel ranks - topology-specific expert parallel ranks - KV-cache transfer across deployments - heterogeneous resource profiles (prefill vs decode) For high throughput workloads, batch size can increase by more than 2x compared to tensor parallelism. Wide-EP distributes experts across GPUs + adds load balancing, expert replication, and optimized all2alls. The complexity: replicas must form shared DP/EP groups, agree on IP/port #s, and scale ingress separately from engines. Ray Serve LLM now exposes a builder API that integrates with vLLM and handles all of this automatically. Prefill/decode disaggregation separates input token processing and token generation into independent deployments with different scaling behaviors. Prefill is compute-intensive, while decode is memory bandwidth-bound. When the same replica handles both in the same batch, prefill delays accumulate and throughput tanks. Ray Serve LLM now has a build_pd_openai_app builder that: - Creates prefill + decode deployments - Sets up the NIXL KV transfer connector - Routes requests through a PDProxyServer - (Optionally) uses prefix cache-aware routing for prefill For the full writeup, see link in comments 🙂
-
-
Today's the day! We're hosting the Ray x AI21 Labs Meetup: Efficient LLM Inference at Scale with vLLM tonight in Tel Aviv, and there are still a few spots left if you want to join us. 🗓️ Today, Wednesday Nov 26 🕕 6:00–8:00 PM (GMT+2) 📍 AI21 Labs, Tel Aviv We'll dive into how teams use Ray and vLLM to run LLM inference faster, cheaper, and at scale in real-world production environments, with speakers from Anyscale and AI21 Labs. 👉 Grab your spot: https://luma.com/6skfv9ob
-
-
Anyscale reposted this
At NeurIPS next week? Join us at our Open Source AI Reception, an evening focused on open source collaboration hosted by Cloud Native Computing Foundation (CNCF) and PyTorch with Anyscale, Featherless AI, Hugging Face, and Unsloth AI. Join AI enthusiasts, developers, and researchers for an evening of networking and conversation outside NeurIPS 2025. Drinks and light bites provided. 🔗 Register to secure your spot: https://hubs.la/Q03VVjP50 Wednesday, December 3, 6:00–9:00 PM PT Union Kitchen and Tap Gaslamp, San Diego, California, USA #PyTorch #NeurIPS2025 #NeurIPS #OpenSourceAI
-
-
Anyscale reposted this
At NeurIPS next week? Join us at our Open Source AI Reception, an evening focused on open source collaboration hosted by Cloud Native Computing Foundation (CNCF) and PyTorch with Anyscale, Featherless AI, Hugging Face, and Unsloth AI. Join AI enthusiasts, developers, and researchers for an evening of networking and conversation outside NeurIPS 2025. Drinks and light bites provided. 🔗 Register to secure your spot: https://hubs.la/Q03VVjP50 Wednesday, December 3, 6:00–9:00 PM PT Union Kitchen and Tap Gaslamp, San Diego, California, USA #PyTorch #NeurIPS2025 #NeurIPS #OpenSourceAI
-
-
Heading to AWS re:Invent? Join us at booth #1854, book a 1:1 meeting or learn more across breakouts, lightning talks and executive round tables. https://lnkd.in/gmVc9wB6
-
The rise of LLMs has shifted attention toward online inference… …but most real applications still hinge on something less glamorous: data preparation. With multimodal requirements becoming the norm, that prep work is getting more complex, not less. And to build cost-efficient agentic systems, the industry is realizing we need a mix of large and small models — each playing to its strengths. Large models coordinate tools and workflows; smaller models execute tasks quickly and efficiently. Multimodal data processing, LLM fine-tuning / post-training, and model chaining for agentic inference all become far more scalable with Ray as a common compute fabric for AI. 🎥 In this demo, Akshay (the Ray expert) shows Goku (the Ray noob) how any developer can go from idea to scalable multimodal application in days using the Anyscale managed platform that gives developers advanced developer tooling and managed Ray clusters to go from idea to scalable application in a matter of days.
-
Ever wondered how Ray actually scales AI in the real world? Join our live, virtual hands-on lab and code alongside our lead instructor. You’ll get step-by-step guidance and real-time answers to your questions. In this session, you’ll learn how to use Ray to: - Build and scale data pipelines with Ray - Run GPU batch inference efficiently - Integrate LLM inference into your workflows - Seats are limited to keep it interactive — save your spot today! Don’t just see Ray – build with it. Register now: https://lnkd.in/gZwkmpQx