Quickstart
Get started with Cua
Set Up Your Computer Sanbox
Choose how you want to run your Cua sandbox. This will be the isolated environment where your automated tasks will execute.
You can run your Cua sandbox in the cloud (recommended for easiest setup), locally in a Docker container on any platform, on a macOS VM with Lume, or on Windows with a Windows Sandbox. Choose the option that matches your system and needs.
Create and manage cloud sandboxes that run Linux (Ubuntu), Windows, or macOS.
First, create your API key:
- Go to cua.ai/signin
- Navigate to Dashboard > API Keys > New API Key to create your API key
- Important: Copy and save your API key immediately - you won't be able to see it again (you'll need to regenerate if lost)
Then, create your sandbox using either option:
Option 1: Via Website
- Navigate to Dashboard > Sandboxes > Create Sandbox
- Create a sandbox, choosing Linux, Windows, or macOS

- Note your sandbox name
Option 2: Via CLI
- Install the Cua CLI:
# macOS/Linux
curl -LsSf https://cua.ai/cli/install.sh | sh
# Windows
powershell -ExecutionPolicy ByPass -c "irm https://cua.ai/cli/install.ps1 | iex"- Login and create a sandbox:
cua auth login
cua sb create --os linux --size small --region north-america- Note your sandbox name and password from the output
Your Cloud Sandbox will be automatically configured and ready to use.
Run Linux desktop locally on macOS, Windows, or Linux hosts.
-
Install Docker Desktop or Docker Engine
-
Pull a CUA Docker image:
# XFCE (Lightweight) - recommended for most use cases
docker pull --platform=linux/amd64 trycua/cua-xfce:latest
# OR KASM (Full-Featured) - full Ubuntu desktop
docker pull --platform=linux/amd64 trycua/cua-ubuntu:latestRun full Linux (Ubuntu Desktop), Windows 11, or Android 11 VMs inside Docker containers using QEMU virtualization.
Linux and Windows images require a golden image preparation step on first use. Android images start directly without preparation.
1. Install Docker Desktop or Docker Engine
2. Pull the QEMU Linux image:
docker pull trycua/cua-qemu-linux:latest3. Download Ubuntu 22.04 LTS Server ISO:
- Download the Ubuntu 22.04 Server ISO (~2GB)
4. Create golden image:
docker run -it --rm \
--device=/dev/kvm \
--cap-add NET_ADMIN \
--mount type=bind,source=/path/to/ubuntu-22.04.5-live-server-amd64.iso,target=/custom.iso \
-v ~/cua-storage/linux:/storage \
-p 8006:8006 \
-p 5000:5000 \
-e RAM_SIZE=8G \
-e CPU_CORES=4 \
-e DISK_SIZE=64G \
trycua/cua-qemu-linux:latestThe container will install Ubuntu Desktop from the ISO and shut down when complete. Monitor progress at http://localhost:8006.
1. Install Docker Desktop or Docker Engine
2. Pull the QEMU Windows image:
docker pull trycua/cua-qemu-windows:latest3. Download Windows 11 Enterprise Evaluation ISO:
- Visit Microsoft Evaluation Center
- Accept the Terms of Service
- Download Windows 11 Enterprise Evaluation (90-day trial, English, United States) ISO (~6GB)
4. Create golden image:
docker run -it --rm \
--device=/dev/kvm \
--cap-add NET_ADMIN \
--mount type=bind,source=/path/to/windows-11-enterprise-eval.iso,target=/custom.iso \
-v ~/cua-storage/windows:/storage \
-p 8006:8006 \
-p 5000:5000 \
-e RAM_SIZE=8G \
-e CPU_CORES=4 \
-e DISK_SIZE=64G \
trycua/cua-qemu-windows:latestThe container will install Windows 11 from the ISO and shut down when complete. Monitor progress at http://localhost:8006.
1. Install Docker Desktop or Docker Engine
2. Pull the QEMU Android image:
docker pull trycua/cua-qemu-android:latestNo golden image preparation needed - the Android emulator starts directly when you run it!
macOS hosts only - requires Lume CLI.
- Install the Lume CLI:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"- Start a local Cua sandbox:
lume run macos-sequoia-cua:latestWindows hosts only - requires Windows 10 Pro/Enterprise or Windows 11.
- Enable Windows Sandbox
- Install the
pywinsandboxdependency:
pip install -U git+git://github.com/karkason/pywinsandbox.git- Windows Sandbox will be automatically configured when you run the CLI
Automate Your Sandbox
Python Version Compatibility
Cua packages require Python 3.12 or 3.13. Python 3.14 is not currently supported due to dependency compatibility issues (pydantic-core/PyO3 compatibility). If you encounter build errors on Python 3.14, please use Python 3.13 or earlier.
This section guides you through building automation in layers:
- Cua Computer Framework - Direct sandbox control for manual automation and testing
- Cua Agent Framework - Adds AI automation on top, using vision-language models to understand and interact with the UI
Start by setting up Cua Computer Framework to verify your sandbox works, then add Cua Agent Framework on top to enable intelligent, autonomous automation.
Step 1: Connect with Cua Computer Framework
Install Cua Computer Framework and verify your sandbox is working by performing basic interactions such as taking screenshots or simulating user input. This is an important verification step before adding AI agents.
Install the Cua computer Python SDK:
Using uv (recommended):
uv pip install cua-computerOr with pip:
pip install cua-computerThen, connect to your desired computer environment:
Set your Cua API key (same key used for model inference) and connect to your sandbox:
import os
from computer import Computer
import asyncio
os.environ["CUA_API_KEY"] = "sk_cua-api01_..."
computer = Computer(
os_type="linux", # or "windows" or "macos"
provider_type="cloud",
name="your-sandbox-name" # from CLI or website
)
async def main():
await computer.run() # Connect to the sandbox
# Alternative: If your VM is not running, use start() instead:
# await computer.start() # Start and connect to the sandbox
try:
# Take a screenshot of the computer's current display
screenshot = await computer.interface.screenshot()
# Simulate a left-click at coordinates (100, 100)
await computer.interface.left_click(100, 100)
# Type "Hello!" into the active application
await computer.interface.type_text("Hello!")
finally:
await computer.disconnect()
# Alternative: If you want to fully stop the VM, use stop() instead:
# await computer.stop() # Fully stop VM and disconnect
asyncio.run(main())from computer import Computer
import asyncio
computer = Computer(
os_type="linux",
provider_type="docker",
image="trycua/cua-xfce:latest" # or "trycua/cua-ubuntu:latest"
)
async def main():
await computer.run() # Launch & connect to the sandbox
# Alternative: If your VM is not running, use start() instead:
# await computer.start() # Start and connect to the sandbox
try:
# Take a screenshot of the computer's current display
screenshot = await computer.interface.screenshot()
# Simulate a left-click at coordinates (100, 100)
await computer.interface.left_click(100, 100)
# Type "Hello!" into the active application
await computer.interface.type_text("Hello!")
finally:
await computer.disconnect()
# Alternative: If you want to fully stop the VM, use stop() instead:
# await computer.stop() # Fully stop VM and disconnect
asyncio.run(main())from computer import Computer
import asyncio
computer = Computer(
os_type="linux",
provider_type="docker",
image="trycua/cua-qemu-linux:latest",
storage="~/cua-storage/linux",
run_opts={
"devices": ["/dev/kvm"], # Optional but recommended
},
)
async def main():
await computer.run() # Boot from golden image
# Alternative: If your VM is not running, use start() instead:
# await computer.start() # Start and connect
try:
# Take a screenshot of the computer's current display
screenshot = await computer.interface.screenshot()
# Simulate a left-click at coordinates (100, 100)
await computer.interface.left_click(100, 100)
# Type "Hello!" into the active application
await computer.interface.type_text("Hello!")
finally:
await computer.disconnect()
# Alternative: If you want to fully stop the VM, use stop() instead:
# await computer.stop() # Fully stop VM and disconnect
asyncio.run(main())from computer import Computer
import asyncio
computer = Computer(
os_type="windows",
provider_type="docker",
image="trycua/cua-qemu-windows:latest",
storage="~/cua-storage/windows",
run_opts={
"devices": ["/dev/kvm"], # Optional but recommended
},
)
async def main():
await computer.run() # Boot from golden image
# Alternative: If your VM is not running, use start() instead:
# await computer.start() # Start and connect
try:
# Take a screenshot of the computer's current display
screenshot = await computer.interface.screenshot()
# Simulate a left-click at coordinates (100, 100)
await computer.interface.left_click(100, 100)
# Type "Hello!" into the active application
await computer.interface.type_text("Hello!")
finally:
await computer.disconnect()
# Alternative: If you want to fully stop the VM, use stop() instead:
# await computer.stop() # Fully stop VM and disconnect
asyncio.run(main())from computer import Computer
import asyncio
computer = Computer(
os_type="android",
provider_type="docker",
image="trycua/cua-qemu-android:latest",
timeout=150, # Emulator needs more time to boot
run_opts={
"devices": ["/dev/kvm"], # Required for Android emulator
"env": {
"EMULATOR_DEVICE": "Samsung Galaxy S10",
},
},
)
async def main():
await computer.run() # Launch & connect to Android emulator
try:
# Take a screenshot of the Android screen
screenshot = await computer.interface.screenshot()
# Simulate a left-click at coordinates (100, 100)
await computer.interface.left_click(100, 100)
# Type "Hello!" into the active application
await computer.interface.type_text("Hello!")
finally:
await computer.disconnect()
asyncio.run(main())from computer import Computer
import asyncio
computer = Computer(
os_type="macos",
provider_type="lume",
name="macos-sequoia-cua:latest"
)
async def main():
await computer.run() # Launch & connect to the sandbox
# Alternative: If your VM is not running, use start() instead:
# await computer.start() # Start and connect to the sandbox
try:
# Take a screenshot of the computer's current display
screenshot = await computer.interface.screenshot()
# Simulate a left-click at coordinates (100, 100)
await computer.interface.left_click(100, 100)
# Type "Hello!" into the active application
await computer.interface.type_text("Hello!")
finally:
await computer.disconnect()
# Alternative: If you want to fully stop the VM, use stop() instead:
# await computer.stop() # Fully stop VM and disconnect
asyncio.run(main())from computer import Computer
import asyncio
computer = Computer(
os_type="windows",
provider_type="windows_sandbox"
)
async def main():
await computer.run() # Launch & connect to the sandbox
# Alternative: If your VM is not running, use start() instead:
# await computer.start() # Start and connect to the sandbox
try:
# Take a screenshot of the computer's current display
screenshot = await computer.interface.screenshot()
# Simulate a left-click at coordinates (100, 100)
await computer.interface.left_click(100, 100)
# Type "Hello!" into the active application
await computer.interface.type_text("Hello!")
finally:
await computer.disconnect()
# Alternative: If you want to fully stop the VM, use stop() instead:
# await computer.stop() # Fully stop VM and disconnect
asyncio.run(main())Install and run cua-computer-server:
pip install cua-computer-server
python -m computer_serverThen, use the Computer object to connect:
from computer import Computer
import asyncio
computer = Computer(use_host_computer_server=True)
async def main():
await computer.run() # Connect to the host desktop
# Alternative: If your computer server is not running, use start() instead:
# await computer.start() # Start and connect to the host desktop
try:
# Take a screenshot of the computer's current display
screenshot = await computer.interface.screenshot()
# Simulate a left-click at coordinates (100, 100)
await computer.interface.left_click(100, 100)
# Type "Hello!" into the active application
await computer.interface.type_text("Hello!")
finally:
await computer.disconnect()
# Alternative: If you want to fully stop everything, use stop() instead:
# await computer.stop() # Fully stop and disconnect
asyncio.run(main())Install the Cua computer TypeScript SDK:
npm install @trycua/computerThen, connect to your desired computer environment:
Set your Cua API key (same key used for model inference):
export CUA_API_KEY="sk_cua-api01_..."Then connect to your sandbox:
import { Computer, OSType } from '@trycua/computer';
const computer = new Computer({
osType: OSType.LINUX, // or OSType.WINDOWS or OSType.MACOS
name: "your-sandbox-name" // from CLI or website
});
await computer.run(); // Connect to the sandboximport { Computer, OSType, ProviderType } from '@trycua/computer';
const computer = new Computer({
osType: OSType.LINUX,
providerType: ProviderType.DOCKER,
image: "trycua/cua-xfce:latest" // or "trycua/cua-ubuntu:latest"
});
await computer.run(); // Launch & connect to the sandboximport { Computer, OSType, ProviderType } from '@trycua/computer';
const computer = new Computer({
osType: OSType.MACOS,
providerType: ProviderType.LUME,
name: "macos-sequoia-cua:latest"
});
await computer.run(); // Launch & connect to the sandboximport { Computer, OSType, ProviderType } from '@trycua/computer';
const computer = new Computer({
osType: OSType.WINDOWS,
providerType: ProviderType.WINDOWS_SANDBOX
});
await computer.run(); // Launch & connect to the sandboxFirst, install and run cua-computer-server:
pip install cua-computer-server
python -m computer_serverThen, use the Computer object to connect:
import { Computer } from '@trycua/computer';
const computer = new Computer({ useHostComputerServer: true });
await computer.run(); // Connect to the host desktopOnce connected, you can perform interactions:
try {
// Take a screenshot of the computer's current display
const screenshot = await computer.interface.screenshot();
// Simulate a left-click at coordinates (100, 100)
await computer.interface.leftClick(100, 100);
// Type "Hello!" into the active application
await computer.interface.typeText("Hello!");
} finally {
await computer.disconnect();
}Learn more about computers in the Cua computers documentation.
Step 2: Add AI Automation with Cua Agent Framework
Now that you've verified your sandbox works, use an Agent to automate complex tasks by providing it with a goal. The agent will interact with the computer environment using a vision-language model to understand the UI and execute actions.
While you can build your own agent loop with any LLM, Cua Agent Framework is the recommended approach as it provides:
- 100+ VLM options through Cua VLM Router and direct provider access
- Built-in optimizations for computer-use tasks
- Structured agent loops for consistent behavior
Install the Cua agent Python SDK:
Using uv (recommended):
uv pip install "cua-agent[all]"Or with pip:
pip install "cua-agent[all]"Choose how you want to access vision-language models for your agent:
Use Cua's inference API to access multiple model providers with a single API key (same key used for sandbox access). Cua VLM Router provides intelligent routing and cost optimization.
Use the agent with Cua models:
import os
import asyncio
from computer import Computer
from agent import ComputerAgent
os.environ["CUA_API_KEY"] = "sk_cua-api01_..."
computer = Computer(
os_type="linux", # or "windows" or "macos"
provider_type="cloud",
name="your-sandbox-name" # from CLI or website
)
async def main():
await computer.run() # Connect to the sandbox
# Alternative: If your VM is not running, use start() instead:
# await computer.start() # Start and connect to the sandbox
try:
agent = ComputerAgent(
model="cua/anthropic/claude-sonnet-4.5", # CUA-routed model
tools=[computer],
max_trajectory_budget=5.0
)
messages = [{"role": "user", "content": "Take a screenshot and tell me what you see"}]
async for result in agent.run(messages):
for item in result["output"]:
if item["type"] == "message":
print(item["content"][0]["text"])
finally:
await computer.disconnect()
# Alternative: If you want to fully stop the VM, use stop() instead:
# await computer.stop() # Fully stop VM and disconnect
asyncio.run(main())Available Cua models:
cua/anthropic/claude-sonnet-4.5- Claude Sonnet 4.5 (recommended)cua/anthropic/claude-opus-4.5- Claude Opus 4.5 (enhanced agentic capabilities)cua/anthropic/claude-haiku-4.5- Claude Haiku 4.5 (faster, cost-effective)cua/google/gemini-3-pro-preview- Gemini 3 Pro Preview (most powerful multimodal)cua/google/gemini-3-flash-preview- Gemini 3 Flash Preview (fastest and cheapest, recommended for balance)
Available composed models
huggingface-local/HelloKKMe/GTA1-7B+anthropic/claude-sonnet-4-5-20250929- GTA1 grounding + Claude Sonnet 4.5 planninghuggingface-local/HelloKKMe/GTA1-7B+openai/gpt-5- GTA1 grounding + GPT-5 planninghuggingface-local/ByteDance-Seed/UI-TARS-1.5-7B+openai/gpt-4o- UI-TARS grounding + GPT-4o planningmoondream3+openai/gpt-4o- Moondream3 grounding + GPT-4o planning
Benefits:
- Single API key for multiple providers
- Cost tracking and optimization
- No need to manage multiple provider keys
Use your own API keys from model providers like Anthropic, OpenAI, or others.
Use the agent with your provider:
import os
import asyncio
from computer import Computer
from agent import ComputerAgent
# Set your provider API key
os.environ["ANTHROPIC_API_KEY"] = "sk-ant-..." # For Anthropic
# OR
os.environ["OPENAI_API_KEY"] = "sk-..." # For OpenAI
computer = Computer(
os_type="linux", # or "windows" or "macos"
provider_type="cloud",
name="your-sandbox-name" # from CLI or website
)
async def main():
await computer.run() # Launch & connect to the sandbox
# Alternative: If your VM is not running, use start() instead:
# await computer.start() # Start and connect to the sandbox
try:
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929", # Direct provider model
tools=[computer],
max_trajectory_budget=5.0
)
messages = [{"role": "user", "content": "Take a screenshot and tell me what you see"}]
async for result in agent.run(messages):
for item in result["output"]:
if item["type"] == "message":
print(item["content"][0]["text"])
finally:
await computer.disconnect()
# Alternative: If you want to fully stop the VM, use stop() instead:
# await computer.stop() # Fully stop VM and disconnect
asyncio.run(main())Supported providers:
anthropic/claude-*- Anthropic Claude modelsopenai/gpt-*- OpenAI GPT modelsopenai/o1-*- OpenAI o1 modelshuggingface-local/*- Local HuggingFace models- And many more via LiteLLM
See Supported Models for the complete list.
For TypeScript, you can build agent loops using the Vercel AI SDK with the Cua Computer Framework TypeScript library. The Vercel AI SDK provides a unified interface for building multi-step agent workflows with language models.
Install the required packages:
npm install @trycua/computer aiHere's an example of building an agent loop with the Vercel AI SDK:
import Anthropic from "@anthropic-ai/sdk";
import { Computer, OSType } from "@trycua/computer";
import { generateText } from "ai";
import { anthropic } from "@ai-sdk/anthropic";
const client = new Anthropic();
let computer: Computer;
const computerTool = {
type: "tool" as const,
name: "computer",
description: "Control the computer with actions like screenshot, click, type, etc.",
inputSchema: {
type: "object" as const,
properties: {
action: {
type: "string" as const,
description: "Action to perform (screenshot, click, type, key_press, etc.)",
},
coordinate: {
type: "array" as const,
items: { type: "number" as const },
description: "x, y coordinates for click actions",
},
text: {
type: "string" as const,
description: "Text to type",
},
},
required: ["action"],
},
};
async function runAgentLoop(goal: string) {
// Initialize computer
computer = new Computer({
osType: OSType.LINUX,
provider_type: "cloud",
name: "your-sandbox-name",
apiKey: process.env.CUA_API_KEY!,
});
await computer.run();
const messages: any[] = [];
// First message with goal
messages.push({
role: "user",
content: goal,
});
// Agent loop
for (let i = 0; i < 10; i++) {
// Get model response with tool use
const response = await client.messages.create({
model: "claude-opus-4-1-20250805",
max_tokens: 4096,
tools: [computerTool],
messages: messages,
});
// Check if we're done
if (response.stop_reason === "end_turn") {
console.log("Task completed!");
break;
}
// Add assistant response to history
messages.push({
role: "assistant",
content: response.content,
});
// Process tool calls
const toolResults = [];
for (const block of response.content) {
if (block.type === "tool_use") {
try {
let result: any;
switch (block.input.action) {
case "screenshot":
result = await computer.interface.screenshot();
break;
case "click":
result = await computer.interface.click(
block.input.coordinate[0],
block.input.coordinate[1]
);
break;
case "type":
result = await computer.interface.type(block.input.text);
break;
case "key_press":
result = await computer.interface.key_press(block.input.text);
break;
default:
result = { error: "Unknown action" };
}
toolResults.push({
type: "tool_result",
tool_use_id: block.id,
content: JSON.stringify(result),
});
} catch (error) {
toolResults.push({
type: "tool_result",
tool_use_id: block.id,
content: `Error: ${error}`,
is_error: true,
});
}
}
}
// Add tool results to messages
if (toolResults.length > 0) {
messages.push({
role: "user",
content: toolResults,
});
}
}
await computer.disconnect();
}
// Run the agent
runAgentLoop("Take a screenshot and tell me what you see");For more details and examples, see the Vercel AI SDK Computer Use Cookbook.
Learn more about agents in Agent Loops and available models in Supported Models.
Next Steps
- Explore Cua Computer Framework Commands for more sandbox interactions
- Learn about Agent Loops and advanced agent configuration
- Check out Custom Tools to extend your agents
- Review Supported Model Providers for more LLM options
- Try the Form Filling example use case
- Join our Discord community for help and discussion
Was this page helpful?