Anyscale’s cover photo
Anyscale

Anyscale

Software Development

San Francisco, California 55,166 followers

Scalable compute for AI and Python

About us

Scalable compute for AI and Python Anyscale enables developers of all skill levels to easily build applications that run at any scale, from a laptop to a data center.

Website
https://anyscale.com
Industry
Software Development
Company size
201-500 employees
Headquarters
San Francisco, California
Type
Privately Held
Founded
2019

Products

Locations

Employees at Anyscale

Updates

  • Anyscale reposted this

    View organization page for AI21 Labs

    33,748 followers

    What an evening at AI21 Labs! 🎉 We were proud to host the first Ray meetup in Tel Aviv together with our partners at Anyscale, putting efficient LLM inference at scale in the spotlight. Huge thanks to our speakers: Linda Haviv, Carl Winkler, Ido Ben David and our own Asaf Joseph Gardin. Great talks, sharp questions, and an amazing community turnout. We’re excited to keep growing the Ray and LLM inference ecosystem in TLV together.

    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
  • Anyscale reposted this

    Ray and vLLM have worked closely together to improve the large model interactive development experience! Spinning up multi-node vLLM with Ray on interactive environments can be tedious, requiring users to juggle separate commands for different nodes, breaking the “single symmetric entrypoint” mental model that many users expect. Ray now has a new command: ray symmetric-run. This command makes it possible to launch the same entrypoint command on every node in a Ray cluster. This makes it really easy to spawn vLLM servers with multi-node models on HPC setups or in using parallel ssh tools like mpssh. Check out the blog: https://lnkd.in/gniPWzge Thanks for Kaichao You for the collaboration!

    • No alternative text description for this image
  • Anyscale reposted this

    View profile for Seiji Eicher

    Distributed LLM Inference @ Anyscale

    Wide-EP and prefill/decode disaggregation APIs for vLLM are now available in Ray 2.52 🚀🚀 Validated at 2.4k tokens/H200 on Anyscale Runtime, these patterns maximize sparse MoE model (DeepSeek, Kimi, Qwen3) inference efficiency, but often require non-trivial orchestration logic. Here’s how they work…🧵 Engine replicas can no longer be scaled independently. Efficient serving now requires coordinating: - data-parallel ranks - topology-specific expert parallel ranks - KV-cache transfer across deployments - heterogeneous resource profiles (prefill vs decode) For high throughput workloads, batch size can increase by more than 2x compared to tensor parallelism. Wide-EP distributes experts across GPUs + adds load balancing, expert replication, and optimized all2alls. The complexity: replicas must form shared DP/EP groups, agree on IP/port #s, and scale ingress separately from engines. Ray Serve LLM now exposes a builder API that integrates with vLLM and handles all of this automatically. Prefill/decode disaggregation separates input token processing and token generation into independent deployments with different scaling behaviors. Prefill is compute-intensive, while decode is memory bandwidth-bound. When the same replica handles both in the same batch, prefill delays accumulate and throughput tanks. Ray Serve LLM now has a build_pd_openai_app builder that: - Creates prefill + decode deployments - Sets up the NIXL KV transfer connector - Routes requests through a PDProxyServer - (Optionally) uses prefix cache-aware routing for prefill For the full writeup, see link in comments 🙂

    • No alternative text description for this image
  • Today's the day! We're hosting the Ray x AI21 Labs Meetup: Efficient LLM Inference at Scale with vLLM tonight in Tel Aviv, and there are still a few spots left if you want to join us. 🗓️ Today, Wednesday Nov 26 🕕 6:00–8:00 PM (GMT+2) 📍 AI21 Labs, Tel Aviv We'll dive into how teams use Ray and vLLM to run LLM inference faster, cheaper, and at scale in real-world production environments, with speakers from Anyscale and AI21 Labs. 👉 Grab your spot: https://luma.com/6skfv9ob

    • No alternative text description for this image
  • Anyscale reposted this

    View organization page for PyTorch

    299,862 followers

    At NeurIPS next week? Join us at our Open Source AI Reception, an evening focused on open source collaboration hosted by Cloud Native Computing Foundation (CNCF) and PyTorch with Anyscale, Featherless AI, Hugging Face, and Unsloth AI. Join AI enthusiasts, developers, and researchers for an evening of networking and conversation outside NeurIPS 2025. Drinks and light bites provided. 🔗 Register to secure your spot: https://hubs.la/Q03VVjP50 Wednesday, December 3, 6:00–9:00 PM PT Union Kitchen and Tap Gaslamp, San Diego, California, USA #PyTorch #NeurIPS2025 #NeurIPS #OpenSourceAI

    • No alternative text description for this image
  • Anyscale reposted this

    View organization page for PyTorch

    299,862 followers

    At NeurIPS next week? Join us at our Open Source AI Reception, an evening focused on open source collaboration hosted by Cloud Native Computing Foundation (CNCF) and PyTorch with Anyscale, Featherless AI, Hugging Face, and Unsloth AI. Join AI enthusiasts, developers, and researchers for an evening of networking and conversation outside NeurIPS 2025. Drinks and light bites provided. 🔗 Register to secure your spot: https://hubs.la/Q03VVjP50 Wednesday, December 3, 6:00–9:00 PM PT Union Kitchen and Tap Gaslamp, San Diego, California, USA #PyTorch #NeurIPS2025 #NeurIPS #OpenSourceAI

    • No alternative text description for this image
  • The rise of LLMs has shifted attention toward online inference… …but most real applications still hinge on something less glamorous: data preparation. With multimodal requirements becoming the norm, that prep work is getting more complex, not less. And to build cost-efficient agentic systems, the industry is realizing we need a mix of large and small models — each playing to its strengths. Large models coordinate tools and workflows; smaller models execute tasks quickly and efficiently. Multimodal data processing, LLM fine-tuning / post-training, and model chaining for agentic inference all become far more scalable with Ray as a common compute fabric for AI. 🎥 In this demo, Akshay (the Ray expert) shows Goku (the Ray noob) how any developer can go from idea to scalable multimodal application in days using the Anyscale managed platform that gives developers advanced developer tooling and managed Ray clusters to go from idea to scalable application in a matter of days.

  • Ever wondered how Ray actually scales AI in the real world? Join our live, virtual hands-on lab and code alongside our lead instructor. You’ll get step-by-step guidance and real-time answers to your questions. In this session, you’ll learn how to use Ray to: - Build and scale data pipelines with Ray - Run GPU batch inference efficiently - Integrate LLM inference into your workflows - Seats are limited to keep it interactive — save your spot today! Don’t just see Ray – build with it. Register now: https://lnkd.in/gZwkmpQx

Similar pages

Browse jobs

Funding