How do I install Computer Use Agents?

Install Computer Use Agents with a single command: npx mdskills install sickn33/computer-use-agents. This downloads the skill files into your project and your AI agent picks them up automatically.

What platforms support Computer Use Agents?

Computer Use Agents works with Claude Code, Claude Desktop, Cursor, Vscode Copilot, Windsurf, Continue Dev, Codex, Gemini Cli, Amp, Roo Code, Goose, Opencode, Trae, Qodo, Command Code. Skills use the open SKILL.md format which is compatible with any AI coding agent that reads markdown instructions.

← Back to skills

Computer Use Agents

Name: Computer Use Agents: AI Agent Skill
Brand: sickn33
Availability: InStock
Rating: 8 (1 reviews)
Author: sickn33

Verified

ProductivityIntermediate

Build AI agents that interact with computers like humans do - viewing screens, moving cursors, clicking buttons, and typing text. Covers Anthropic's Computer Use, OpenAI's Operator/CUA, and open-source alternatives. Critical focus on sandboxing, security, and handling the unique challenges of vision-based control. Use when: computer use, desktop automation agent, screen control AI, vision-based agent, GUI automation.

by @sickn331 downloads13,166Updated 2/20/2026

Add this skill

npx mdskills install sickn33/computer-use-agents

Fork & Edit

Are you @sickn33? Sign in with GitHub to claim this listing.

Skill Advisor8.0

Comprehensive computer use agent guide with production-quality patterns and security emphasis

+Provides complete Docker-based sandboxing patterns with security constraints
+Covers perception-reasoning-action loop with working code examples
+Includes sharp edges table highlighting critical security and detection risks
-Code examples appear truncated, limiting immediate implementation value

SKILL.md

Edit in Browser

1---
2name: computer-use-agents
3description: "Build AI agents that interact with computers like humans do - viewing screens, moving cursors, clicking buttons, and typing text. Covers Anthropic's Computer Use, OpenAI's Operator/CUA, and open-source alternatives. Critical focus on sandboxing, security, and handling the unique challenges of vision-based control. Use when: computer use, desktop automation agent, screen control AI, vision-based agent, GUI automation."
4source: vibeship-spawner-skills (Apache 2.0)
5---
6 
7# Computer Use Agents
8 
9## Patterns
10 
11### Perception-Reasoning-Action Loop
12 
13The fundamental architecture of computer use agents: observe screen,
14reason about next action, execute action, repeat. This loop integrates
15vision models with action execution through an iterative pipeline.
16 
17Key components:
181. PERCEPTION: Screenshot captures current screen state
192. REASONING: Vision-language model analyzes and plans
203. ACTION: Execute mouse/keyboard operations
214. FEEDBACK: Observe result, continue or correct
22 
23Critical insight: Vision agents are completely still during "thinking"
24phase (1-5 seconds), creating a detectable pause pattern.
25 
26 
27**When to use**: ['Building any computer use agent from scratch', 'Integrating vision models with desktop control', 'Understanding agent behavior patterns']
28 
29```python
30from anthropic import Anthropic
31from PIL import Image
32import base64
33import pyautogui
34import time
35 
36class ComputerUseAgent:
37    """
38    Perception-Reasoning-Action loop implementation.
39    Based on Anthropic Computer Use patterns.
40    """
41 
42    def __init__(self, client: Anthropic, model: str = "claude-sonnet-4-20250514"):
43        self.client = client
44        self.model = model
45        self.max_steps = 50  # Prevent runaway loops
46        self.action_delay = 0.5  # Seconds between actions
47 
48    def capture_screenshot(self) -> str:
49        """Capture screen and return base64 encoded image."""
50        screenshot = pyautogui.screenshot()
51        # Resize for token efficiency (1280x800 is good balance)
52        screenshot = screenshot.resize((1280, 800), Image.LANCZOS)
53 
54        import io
55        buffer = io.BytesIO()
56        screenshot.save(buffer, format="PNG")
57        return base64.b64encode(buffer.getvalue()).decode()
58 
59    def execute_action(self, action: dict) -> dict:
60        """Execute mouse/keyboard action on the computer."""
61        action_type = action.get("type")
62 
63        if action_type == "click":
64            x, y = action["x"], action["y"]
65            button = action.get("button", "left")
66            pyautogui.click(x, y, button=button)
67            return {"success": True, "action": f"clicked at ({x}, {y})"}
68 
69        elif action_type == "type":
70            text = action["text"]
71            pyautogui.typewrite(text, interval=0.02)
72            return {"success": True, "action": f"typed {len(text)} chars"}
73 
74        elif action_type == "key":
75            key = action["key"]
76            pyautogui.press(key)
77            return {"success": True, "action": f"pressed {key}"}
78 
79        elif action_type == "scroll":
80            direction = action.get("direction", "down")
81            amount = action.get("amount", 3)
82            scroll = -amount if direction == "down" else amount
83            pyautogui.scroll(scroll)
84            return {"success": True, "action": f"scrolled {dir
85```
86 
87### Sandboxed Environment Pattern
88 
89Computer use agents MUST run in isolated, sandboxed environments.
90Never give agents direct access to your main system - the security
91risks are too high. Use Docker containers with virtual desktops.
92 
93Key isolation requirements:
941. NETWORK: Restrict to necessary endpoints only
952. FILESYSTEM: Read-only or scoped to temp directories
963. CREDENTIALS: No access to host credentials
974. SYSCALLS: Filter dangerous system calls
985. RESOURCES: Limit CPU, memory, time
99 
100The goal is "blast radius minimization" - if the agent goes wrong,
101damage is contained to the sandbox.
102 
103 
104**When to use**: ['Deploying any computer use agent', 'Testing agent behavior safely', 'Running untrusted automation tasks']
105 
106```python
107# Dockerfile for sandboxed computer use environment
108# Based on Anthropic's reference implementation pattern
109 
110FROM ubuntu:22.04
111 
112# Install desktop environment
113RUN apt-get update && apt-get install -y \
114    xvfb \
115    x11vnc \
116    fluxbox \
117    xterm \
118    firefox \
119    python3 \
120    python3-pip \
121    supervisor
122 
123# Security: Create non-root user
124RUN useradd -m -s /bin/bash agent && \
125    mkdir -p /home/agent/.vnc
126 
127# Install Python dependencies
128COPY requirements.txt /tmp/
129RUN pip3 install -r /tmp/requirements.txt
130 
131# Security: Drop capabilities
132RUN apt-get install -y --no-install-recommends libcap2-bin && \
133    setcap -r /usr/bin/python3 || true
134 
135# Copy agent code
136COPY --chown=agent:agent . /app
137WORKDIR /app
138 
139# Supervisor config for virtual display + VNC
140COPY supervisord.conf /etc/supervisor/conf.d/
141 
142# Expose VNC port only (not desktop directly)
143EXPOSE 5900
144 
145# Run as non-root
146USER agent
147 
148CMD ["/usr/bin/supervisord", "-c", "/etc/supervisor/conf.d/supervisord.conf"]
149 
150---
151 
152# docker-compose.yml with security constraints
153version: '3.8'
154 
155services:
156  computer-use-agent:
157    build: .
158    ports:
159      - "5900:5900"  # VNC for observation
160      - "8080:8080"  # API for control
161 
162    # Security constraints
163    security_opt:
164      - no-new-privileges:true
165      - seccomp:seccomp-profile.json
166 
167    # Resource limits
168    deploy:
169      resources:
170        limits:
171          cpus: '2'
172          memory: 4G
173        reservations:
174          cpus: '0.5'
175          memory: 1G
176 
177    # Network isolation
178    networks:
179      - agent-network
180 
181    # No access to host filesystem
182    volumes:
183      - agent-tmp:/tmp
184 
185    # Read-only root filesystem
186    read_only: true
187    tmpfs:
188      - /run
189      - /var/run
190 
191    # Environment
192    environment:
193      - DISPLAY=:99
194      - NO_PROXY=localhost
195 
196networks:
197  agent-network:
198    driver: bridge
199    internal: true  # No internet by default
200 
201volumes:
202  agent-tmp:
203 
204---
205 
206# Python wrapper with additional runtime sandboxing
207import subprocess
208import os
209from dataclasses im
210```
211 
212### Anthropic Computer Use Implementation
213 
214Official implementation pattern using Claude's computer use capability.
215Claude 3.5 Sonnet was the first frontier model to offer computer use.
216Claude Opus 4.5 is now the "best model in the world for computer use."
217 
218Key capabilities:
219- screenshot: Capture current screen state
220- mouse: Click, move, drag operations
221- keyboard: Type text, press keys
222- bash: Run shell commands
223- text_editor: View and edit files
224 
225Tool versions:
226- computer_20251124 (Opus 4.5): Adds zoom action for detailed inspection
227- computer_20250124 (All other models): Standard capabilities
228 
229Critical limitation: "Some UI elements (like dropdowns and scrollbars)
230might be tricky for Claude to manipulate" - Anthropic docs
231 
232 
233**When to use**: ['Building production computer use agents', 'Need highest quality vision understanding', 'Full desktop control (not just browser)']
234 
235```python
236from anthropic import Anthropic
237from anthropic.types.beta import (
238    BetaToolComputerUse20241022,
239    BetaToolBash20241022,
240    BetaToolTextEditor20241022,
241)
242import subprocess
243import base64
244from PIL import Image
245import io
246 
247class AnthropicComputerUse:
248    """
249    Official Anthropic Computer Use implementation.
250 
251    Requires:
252    - Docker container with virtual display
253    - VNC for viewing agent actions
254    - Proper tool implementations
255    """
256 
257    def __init__(self):
258        self.client = Anthropic()
259        self.model = "claude-sonnet-4-20250514"  # Best for computer use
260        self.screen_size = (1280, 800)
261 
262    def get_tools(self) -> list:
263        """Define computer use tools."""
264        return [
265            BetaToolComputerUse20241022(
266                type="computer_20241022",
267                name="computer",
268                display_width_px=self.screen_size[0],
269                display_height_px=self.screen_size[1],
270            ),
271            BetaToolBash20241022(
272                type="bash_20241022",
273                name="bash",
274            ),
275            BetaToolTextEditor20241022(
276                type="text_editor_20241022",
277                name="str_replace_editor",
278            ),
279        ]
280 
281    def execute_tool(self, name: str, input: dict) -> dict:
282        """Execute a tool and return result."""
283 
284        if name == "computer":
285            return self._handle_computer_action(input)
286        elif name == "bash":
287            return self._handle_bash(input)
288        elif name == "str_replace_editor":
289            return self._handle_editor(input)
290        else:
291            return {"error": f"Unknown tool: {name}"}
292 
293    def _handle_computer_action(self, input: dict) -> dict:
294        """Handle computer control actions."""
295        action = input.get("action")
296 
297        if action == "screenshot":
298            # Capture via xdotool/scrot
299            subprocess.run(["scrot", "/tmp/screenshot.png"])
300 
301            with open("/tmp/screenshot.png", "rb") as f:
302            
303```
304 
305## ⚠️ Sharp Edges
306 
307| Issue | Severity | Solution |
308|-------|----------|----------|
309| Issue | critical | ## Defense in depth - no single solution works |
310| Issue | medium | ## Add human-like variance to actions |
311| Issue | high | ## Use keyboard alternatives when possible |
312| Issue | medium | ## Accept the tradeoff |
313| Issue | high | ## Implement context management |
314| Issue | high | ## Monitor and limit costs |
315| Issue | critical | ## ALWAYS use sandboxing |
316

Full transparency — inspect the skill content before installing.

New to skill.md files?

See what a SKILL.md file is, how to install one, and how it differs from AGENTS.md or cursorrules.

Read the guide →