LogoCua

Quickstart

Get started with Cua

Set Up Your Computer Sanbox

Choose how you want to run your Cua sandbox. This will be the isolated environment where your automated tasks will execute.

You can run your Cua sandbox in the cloud (recommended for easiest setup), locally in a Docker container on any platform, on a macOS VM with Lume, or on Windows with a Windows Sandbox. Choose the option that matches your system and needs.

Create and manage cloud sandboxes that run Linux (Ubuntu), Windows, or macOS.

First, create your API key:

  1. Go to cua.ai/signin
  2. Navigate to Dashboard > API Keys > New API Key to create your API key
  3. Important: Copy and save your API key immediately - you won't be able to see it again (you'll need to regenerate if lost)

Then, create your sandbox using either option:

Option 1: Via Website

  1. Navigate to Dashboard > Sandboxes > Create Sandbox
  2. Create a sandbox, choosing Linux, Windows, or macOS

Create Sandbox Instance

  1. Note your sandbox name

Option 2: Via CLI

  1. Install the Cua CLI:
# macOS/Linux
curl -LsSf https://cua.ai/cli/install.sh | sh

# Windows
powershell -ExecutionPolicy ByPass -c "irm https://cua.ai/cli/install.ps1 | iex"
  1. Login and create a sandbox:
cua auth login
cua sb create --os linux --size small --region north-america
  1. Note your sandbox name and password from the output

Your Cloud Sandbox will be automatically configured and ready to use.

Run Linux desktop locally on macOS, Windows, or Linux hosts.

  1. Install Docker Desktop or Docker Engine

  2. Pull a CUA Docker image:

# XFCE (Lightweight) - recommended for most use cases
docker pull --platform=linux/amd64 trycua/cua-xfce:latest

# OR KASM (Full-Featured) - full Ubuntu desktop
docker pull --platform=linux/amd64 trycua/cua-ubuntu:latest

Run full Linux (Ubuntu Desktop), Windows 11, or Android 11 VMs inside Docker containers using QEMU virtualization.

Linux and Windows images require a golden image preparation step on first use. Android images start directly without preparation.

1. Install Docker Desktop or Docker Engine

2. Pull the QEMU Linux image:

docker pull trycua/cua-qemu-linux:latest

3. Download Ubuntu 22.04 LTS Server ISO:

4. Create golden image:

docker run -it --rm \
    --device=/dev/kvm \
    --cap-add NET_ADMIN \
    --mount type=bind,source=/path/to/ubuntu-22.04.5-live-server-amd64.iso,target=/custom.iso \
    -v ~/cua-storage/linux:/storage \
    -p 8006:8006 \
    -p 5000:5000 \
    -e RAM_SIZE=8G \
    -e CPU_CORES=4 \
    -e DISK_SIZE=64G \
    trycua/cua-qemu-linux:latest

The container will install Ubuntu Desktop from the ISO and shut down when complete. Monitor progress at http://localhost:8006.

1. Install Docker Desktop or Docker Engine

2. Pull the QEMU Windows image:

docker pull trycua/cua-qemu-windows:latest

3. Download Windows 11 Enterprise Evaluation ISO:

  • Visit Microsoft Evaluation Center
  • Accept the Terms of Service
  • Download Windows 11 Enterprise Evaluation (90-day trial, English, United States) ISO (~6GB)

4. Create golden image:

docker run -it --rm \
    --device=/dev/kvm \
    --cap-add NET_ADMIN \
    --mount type=bind,source=/path/to/windows-11-enterprise-eval.iso,target=/custom.iso \
    -v ~/cua-storage/windows:/storage \
    -p 8006:8006 \
    -p 5000:5000 \
    -e RAM_SIZE=8G \
    -e CPU_CORES=4 \
    -e DISK_SIZE=64G \
    trycua/cua-qemu-windows:latest

The container will install Windows 11 from the ISO and shut down when complete. Monitor progress at http://localhost:8006.

1. Install Docker Desktop or Docker Engine

2. Pull the QEMU Android image:

docker pull trycua/cua-qemu-android:latest

No golden image preparation needed - the Android emulator starts directly when you run it!

macOS hosts only - requires Lume CLI.

  1. Install the Lume CLI:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
  1. Start a local Cua sandbox:
lume run macos-sequoia-cua:latest

Windows hosts only - requires Windows 10 Pro/Enterprise or Windows 11.

  1. Enable Windows Sandbox
  2. Install the pywinsandbox dependency:
pip install -U git+git://github.com/karkason/pywinsandbox.git
  1. Windows Sandbox will be automatically configured when you run the CLI

Automate Your Sandbox

Python Version Compatibility

Cua packages require Python 3.12 or 3.13. Python 3.14 is not currently supported due to dependency compatibility issues (pydantic-core/PyO3 compatibility). If you encounter build errors on Python 3.14, please use Python 3.13 or earlier.

This section guides you through building automation in layers:

  1. Cua Computer Framework - Direct sandbox control for manual automation and testing
  2. Cua Agent Framework - Adds AI automation on top, using vision-language models to understand and interact with the UI

Start by setting up Cua Computer Framework to verify your sandbox works, then add Cua Agent Framework on top to enable intelligent, autonomous automation.

Step 1: Connect with Cua Computer Framework

Install Cua Computer Framework and verify your sandbox is working by performing basic interactions such as taking screenshots or simulating user input. This is an important verification step before adding AI agents.

Install the Cua computer Python SDK:

Using uv (recommended):

uv pip install cua-computer

Or with pip:

pip install cua-computer

Then, connect to your desired computer environment:

Set your Cua API key (same key used for model inference) and connect to your sandbox:

import os
from computer import Computer
import asyncio

os.environ["CUA_API_KEY"] = "sk_cua-api01_..."

computer = Computer(
    os_type="linux",  # or "windows" or "macos"
    provider_type="cloud",
    name="your-sandbox-name"  # from CLI or website
)

async def main():
    await computer.run()  # Connect to the sandbox
    # Alternative: If your VM is not running, use start() instead:
    # await computer.start()  # Start and connect to the sandbox

    try:
        # Take a screenshot of the computer's current display
        screenshot = await computer.interface.screenshot()
        # Simulate a left-click at coordinates (100, 100)
        await computer.interface.left_click(100, 100)
        # Type "Hello!" into the active application
        await computer.interface.type_text("Hello!")
    finally:
        await computer.disconnect()
        # Alternative: If you want to fully stop the VM, use stop() instead:
        # await computer.stop()  # Fully stop VM and disconnect

asyncio.run(main())
from computer import Computer
import asyncio

computer = Computer(
    os_type="linux",
    provider_type="docker",
    image="trycua/cua-xfce:latest"  # or "trycua/cua-ubuntu:latest"
)

async def main():
    await computer.run()  # Launch & connect to the sandbox
    # Alternative: If your VM is not running, use start() instead:
    # await computer.start()  # Start and connect to the sandbox

    try:
        # Take a screenshot of the computer's current display
        screenshot = await computer.interface.screenshot()
        # Simulate a left-click at coordinates (100, 100)
        await computer.interface.left_click(100, 100)
        # Type "Hello!" into the active application
        await computer.interface.type_text("Hello!")
    finally:
        await computer.disconnect()
        # Alternative: If you want to fully stop the VM, use stop() instead:
        # await computer.stop()  # Fully stop VM and disconnect

asyncio.run(main())
from computer import Computer
import asyncio

computer = Computer(
    os_type="linux",
    provider_type="docker",
    image="trycua/cua-qemu-linux:latest",
    storage="~/cua-storage/linux",
    run_opts={
        "devices": ["/dev/kvm"],  # Optional but recommended
    },
)

async def main():
    await computer.run()  # Boot from golden image
    # Alternative: If your VM is not running, use start() instead:
    # await computer.start()  # Start and connect

    try:
        # Take a screenshot of the computer's current display
        screenshot = await computer.interface.screenshot()
        # Simulate a left-click at coordinates (100, 100)
        await computer.interface.left_click(100, 100)
        # Type "Hello!" into the active application
        await computer.interface.type_text("Hello!")
    finally:
        await computer.disconnect()
        # Alternative: If you want to fully stop the VM, use stop() instead:
        # await computer.stop()  # Fully stop VM and disconnect

asyncio.run(main())
from computer import Computer
import asyncio

computer = Computer(
    os_type="windows",
    provider_type="docker",
    image="trycua/cua-qemu-windows:latest",
    storage="~/cua-storage/windows",
    run_opts={
        "devices": ["/dev/kvm"],  # Optional but recommended
    },
)

async def main():
    await computer.run()  # Boot from golden image
    # Alternative: If your VM is not running, use start() instead:
    # await computer.start()  # Start and connect

    try:
        # Take a screenshot of the computer's current display
        screenshot = await computer.interface.screenshot()
        # Simulate a left-click at coordinates (100, 100)
        await computer.interface.left_click(100, 100)
        # Type "Hello!" into the active application
        await computer.interface.type_text("Hello!")
    finally:
        await computer.disconnect()
        # Alternative: If you want to fully stop the VM, use stop() instead:
        # await computer.stop()  # Fully stop VM and disconnect

asyncio.run(main())
from computer import Computer
import asyncio

computer = Computer(
    os_type="android",
    provider_type="docker",
    image="trycua/cua-qemu-android:latest",
    timeout=150,  # Emulator needs more time to boot
    run_opts={
        "devices": ["/dev/kvm"],  # Required for Android emulator
        "env": {
            "EMULATOR_DEVICE": "Samsung Galaxy S10",
        },
    },
)

async def main():
    await computer.run()  # Launch & connect to Android emulator

    try:
        # Take a screenshot of the Android screen
        screenshot = await computer.interface.screenshot()
        # Simulate a left-click at coordinates (100, 100)
        await computer.interface.left_click(100, 100)
        # Type "Hello!" into the active application
        await computer.interface.type_text("Hello!")
    finally:
        await computer.disconnect()

asyncio.run(main())
from computer import Computer
import asyncio

computer = Computer(
    os_type="macos",
    provider_type="lume",
    name="macos-sequoia-cua:latest"
)

async def main():
    await computer.run()  # Launch & connect to the sandbox
    # Alternative: If your VM is not running, use start() instead:
    # await computer.start()  # Start and connect to the sandbox

    try:
        # Take a screenshot of the computer's current display
        screenshot = await computer.interface.screenshot()
        # Simulate a left-click at coordinates (100, 100)
        await computer.interface.left_click(100, 100)
        # Type "Hello!" into the active application
        await computer.interface.type_text("Hello!")
    finally:
        await computer.disconnect()
        # Alternative: If you want to fully stop the VM, use stop() instead:
        # await computer.stop()  # Fully stop VM and disconnect

asyncio.run(main())
from computer import Computer
import asyncio

computer = Computer(
    os_type="windows",
    provider_type="windows_sandbox"
)

async def main():
    await computer.run()  # Launch & connect to the sandbox
    # Alternative: If your VM is not running, use start() instead:
    # await computer.start()  # Start and connect to the sandbox

    try:
        # Take a screenshot of the computer's current display
        screenshot = await computer.interface.screenshot()
        # Simulate a left-click at coordinates (100, 100)
        await computer.interface.left_click(100, 100)
        # Type "Hello!" into the active application
        await computer.interface.type_text("Hello!")
    finally:
        await computer.disconnect()
        # Alternative: If you want to fully stop the VM, use stop() instead:
        # await computer.stop()  # Fully stop VM and disconnect

asyncio.run(main())

Install and run cua-computer-server:

pip install cua-computer-server
python -m computer_server

Then, use the Computer object to connect:

from computer import Computer
import asyncio

computer = Computer(use_host_computer_server=True)

async def main():
    await computer.run()  # Connect to the host desktop
    # Alternative: If your computer server is not running, use start() instead:
    # await computer.start()  # Start and connect to the host desktop

    try:
        # Take a screenshot of the computer's current display
        screenshot = await computer.interface.screenshot()
        # Simulate a left-click at coordinates (100, 100)
        await computer.interface.left_click(100, 100)
        # Type "Hello!" into the active application
        await computer.interface.type_text("Hello!")
    finally:
        await computer.disconnect()
        # Alternative: If you want to fully stop everything, use stop() instead:
        # await computer.stop()  # Fully stop and disconnect

asyncio.run(main())

Install the Cua computer TypeScript SDK:

npm install @trycua/computer

Then, connect to your desired computer environment:

Set your Cua API key (same key used for model inference):

export CUA_API_KEY="sk_cua-api01_..."

Then connect to your sandbox:

import { Computer, OSType } from '@trycua/computer';

const computer = new Computer({
  osType: OSType.LINUX,  // or OSType.WINDOWS or OSType.MACOS
  name: "your-sandbox-name"  // from CLI or website
});
await computer.run(); // Connect to the sandbox
import { Computer, OSType, ProviderType } from '@trycua/computer';

const computer = new Computer({
  osType: OSType.LINUX,
  providerType: ProviderType.DOCKER,
  image: "trycua/cua-xfce:latest"  // or "trycua/cua-ubuntu:latest"
});
await computer.run(); // Launch & connect to the sandbox
import { Computer, OSType, ProviderType } from '@trycua/computer';

const computer = new Computer({
  osType: OSType.MACOS,
  providerType: ProviderType.LUME,
  name: "macos-sequoia-cua:latest"
});
await computer.run(); // Launch & connect to the sandbox
import { Computer, OSType, ProviderType } from '@trycua/computer';

const computer = new Computer({
  osType: OSType.WINDOWS,
  providerType: ProviderType.WINDOWS_SANDBOX
});
await computer.run(); // Launch & connect to the sandbox

First, install and run cua-computer-server:

pip install cua-computer-server
python -m computer_server

Then, use the Computer object to connect:

import { Computer } from '@trycua/computer';

const computer = new Computer({ useHostComputerServer: true });
await computer.run(); // Connect to the host desktop

Once connected, you can perform interactions:

try {
  // Take a screenshot of the computer's current display
  const screenshot = await computer.interface.screenshot();
  // Simulate a left-click at coordinates (100, 100)
  await computer.interface.leftClick(100, 100);
  // Type "Hello!" into the active application
  await computer.interface.typeText("Hello!");
} finally {
  await computer.disconnect();
}

Learn more about computers in the Cua computers documentation.

Step 2: Add AI Automation with Cua Agent Framework

Now that you've verified your sandbox works, use an Agent to automate complex tasks by providing it with a goal. The agent will interact with the computer environment using a vision-language model to understand the UI and execute actions.

While you can build your own agent loop with any LLM, Cua Agent Framework is the recommended approach as it provides:

  • 100+ VLM options through Cua VLM Router and direct provider access
  • Built-in optimizations for computer-use tasks
  • Structured agent loops for consistent behavior

Install the Cua agent Python SDK:

Using uv (recommended):

uv pip install "cua-agent[all]"

Or with pip:

pip install "cua-agent[all]"

Choose how you want to access vision-language models for your agent:

Use Cua's inference API to access multiple model providers with a single API key (same key used for sandbox access). Cua VLM Router provides intelligent routing and cost optimization.

Use the agent with Cua models:

import os
import asyncio
from computer import Computer
from agent import ComputerAgent

os.environ["CUA_API_KEY"] = "sk_cua-api01_..."

computer = Computer(
    os_type="linux",  # or "windows" or "macos"
    provider_type="cloud",
    name="your-sandbox-name"  # from CLI or website
)

async def main():
    await computer.run()  # Connect to the sandbox
    # Alternative: If your VM is not running, use start() instead:
    # await computer.start()  # Start and connect to the sandbox

    try:
        agent = ComputerAgent(
            model="cua/anthropic/claude-sonnet-4.5",  # CUA-routed model
            tools=[computer],
            max_trajectory_budget=5.0
        )

        messages = [{"role": "user", "content": "Take a screenshot and tell me what you see"}]

        async for result in agent.run(messages):
            for item in result["output"]:
                if item["type"] == "message":
                    print(item["content"][0]["text"])
    finally:
        await computer.disconnect()
        # Alternative: If you want to fully stop the VM, use stop() instead:
        # await computer.stop()  # Fully stop VM and disconnect

asyncio.run(main())

Available Cua models:

  • cua/anthropic/claude-sonnet-4.5 - Claude Sonnet 4.5 (recommended)
  • cua/anthropic/claude-opus-4.5 - Claude Opus 4.5 (enhanced agentic capabilities)
  • cua/anthropic/claude-haiku-4.5 - Claude Haiku 4.5 (faster, cost-effective)
  • cua/google/gemini-3-pro-preview - Gemini 3 Pro Preview (most powerful multimodal)
  • cua/google/gemini-3-flash-preview - Gemini 3 Flash Preview (fastest and cheapest, recommended for balance)

Available composed models

  • huggingface-local/HelloKKMe/GTA1-7B+anthropic/claude-sonnet-4-5-20250929 - GTA1 grounding + Claude Sonnet 4.5 planning
  • huggingface-local/HelloKKMe/GTA1-7B+openai/gpt-5 - GTA1 grounding + GPT-5 planning
  • huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B+openai/gpt-4o - UI-TARS grounding + GPT-4o planning
  • moondream3+openai/gpt-4o - Moondream3 grounding + GPT-4o planning

Benefits:

  • Single API key for multiple providers
  • Cost tracking and optimization
  • No need to manage multiple provider keys

Use your own API keys from model providers like Anthropic, OpenAI, or others.

Use the agent with your provider:

import os
import asyncio
from computer import Computer
from agent import ComputerAgent

# Set your provider API key
os.environ["ANTHROPIC_API_KEY"] = "sk-ant-..."  # For Anthropic
# OR
os.environ["OPENAI_API_KEY"] = "sk-..."  # For OpenAI

computer = Computer(
    os_type="linux",  # or "windows" or "macos"
    provider_type="cloud",
    name="your-sandbox-name"  # from CLI or website
)

async def main():
    await computer.run()  # Launch & connect to the sandbox
    # Alternative: If your VM is not running, use start() instead:
    # await computer.start()  # Start and connect to the sandbox

    try:
        agent = ComputerAgent(
            model="anthropic/claude-sonnet-4-5-20250929",  # Direct provider model
            tools=[computer],
            max_trajectory_budget=5.0
        )

        messages = [{"role": "user", "content": "Take a screenshot and tell me what you see"}]

        async for result in agent.run(messages):
            for item in result["output"]:
                if item["type"] == "message":
                    print(item["content"][0]["text"])
    finally:
        await computer.disconnect()
        # Alternative: If you want to fully stop the VM, use stop() instead:
        # await computer.stop()  # Fully stop VM and disconnect

asyncio.run(main())

Supported providers:

  • anthropic/claude-* - Anthropic Claude models
  • openai/gpt-* - OpenAI GPT models
  • openai/o1-* - OpenAI o1 models
  • huggingface-local/* - Local HuggingFace models
  • And many more via LiteLLM

See Supported Models for the complete list.

For TypeScript, you can build agent loops using the Vercel AI SDK with the Cua Computer Framework TypeScript library. The Vercel AI SDK provides a unified interface for building multi-step agent workflows with language models.

Install the required packages:

npm install @trycua/computer ai

Here's an example of building an agent loop with the Vercel AI SDK:

import Anthropic from "@anthropic-ai/sdk";
import { Computer, OSType } from "@trycua/computer";
import { generateText } from "ai";
import { anthropic } from "@ai-sdk/anthropic";

const client = new Anthropic();
let computer: Computer;

const computerTool = {
  type: "tool" as const,
  name: "computer",
  description: "Control the computer with actions like screenshot, click, type, etc.",
  inputSchema: {
    type: "object" as const,
    properties: {
      action: {
        type: "string" as const,
        description: "Action to perform (screenshot, click, type, key_press, etc.)",
      },
      coordinate: {
        type: "array" as const,
        items: { type: "number" as const },
        description: "x, y coordinates for click actions",
      },
      text: {
        type: "string" as const,
        description: "Text to type",
      },
    },
    required: ["action"],
  },
};

async function runAgentLoop(goal: string) {
  // Initialize computer
  computer = new Computer({
    osType: OSType.LINUX,
    provider_type: "cloud",
    name: "your-sandbox-name",
    apiKey: process.env.CUA_API_KEY!,
  });

  await computer.run();

  const messages: any[] = [];

  // First message with goal
  messages.push({
    role: "user",
    content: goal,
  });

  // Agent loop
  for (let i = 0; i < 10; i++) {
    // Get model response with tool use
    const response = await client.messages.create({
      model: "claude-opus-4-1-20250805",
      max_tokens: 4096,
      tools: [computerTool],
      messages: messages,
    });

    // Check if we're done
    if (response.stop_reason === "end_turn") {
      console.log("Task completed!");
      break;
    }

    // Add assistant response to history
    messages.push({
      role: "assistant",
      content: response.content,
    });

    // Process tool calls
    const toolResults = [];
    for (const block of response.content) {
      if (block.type === "tool_use") {
        try {
          let result: any;
          switch (block.input.action) {
            case "screenshot":
              result = await computer.interface.screenshot();
              break;
            case "click":
              result = await computer.interface.click(
                block.input.coordinate[0],
                block.input.coordinate[1]
              );
              break;
            case "type":
              result = await computer.interface.type(block.input.text);
              break;
            case "key_press":
              result = await computer.interface.key_press(block.input.text);
              break;
            default:
              result = { error: "Unknown action" };
          }

          toolResults.push({
            type: "tool_result",
            tool_use_id: block.id,
            content: JSON.stringify(result),
          });
        } catch (error) {
          toolResults.push({
            type: "tool_result",
            tool_use_id: block.id,
            content: `Error: ${error}`,
            is_error: true,
          });
        }
      }
    }

    // Add tool results to messages
    if (toolResults.length > 0) {
      messages.push({
        role: "user",
        content: toolResults,
      });
    }
  }

  await computer.disconnect();
}

// Run the agent
runAgentLoop("Take a screenshot and tell me what you see");

For more details and examples, see the Vercel AI SDK Computer Use Cookbook.

Learn more about agents in Agent Loops and available models in Supported Models.

Next Steps

Was this page helpful?