Real-Time Vision AI Agents

Multi-modal AI agents that see, hear, & remember.
Open-source. Edge-agnostic. Low-latency.

Vision Agents Playground

Gemini 3.1 Flash Live x Stream's Vision Agents SDK

Multi-Speaker Podcast & Long-Form Conversational Audio Apps

Join our

Partner Ecosystem

Building models, tools, or platforms that work with real-time voice or video AI?

We’re actively adding first-party integrations, co-building, and co-marketing with partners.

Create a product page for selling a used item that includes a product image, title, description, and a suggested price.

Facial recognition, package detection, automated package theft response, and posting to X.

Detect and censor offensive gestures, and give three verbal warnings before kicking the user out.

Community & Open Source

Follow Stream on X, star the Vision Agents GitHub repo, and join the discussion on Discord to try demos, share feedback, and contribute.