computer-use
⚠Review·Scanned 2/17/2026
This skill provides desktop GUI automation for headless Linux using Xvfb/XFCE and xdotool. It includes explicit shell commands and scripts to run (e.g., export DISPLAY=:99, ./scripts/screenshot.sh, sudo apt install -y xvfb xfce4).
from clawhub.ai·v2402efd·9.6 KB·0 installs
Scanned from 1.0.0 at 2402efd · Transparency log ↗
$ vett add clawhub.ai/bodii88/computer-useReview findings below
Computer Use Skill
Full desktop GUI control for headless Linux servers. Creates a virtual display (Xvfb + XFCE) so you can run and control desktop applications on VPS/cloud instances without a physical monitor.
Environment
- Display:
:99 - Resolution: 1024x768 (XGA, Anthropic recommended)
- Desktop: XFCE4
Quick Start
export DISPLAY=:99
# Take screenshot
./scripts/screenshot.sh
# Click at coordinates
./scripts/click.sh 512 384 left
# Type text
./scripts/type_text.sh "Hello world"
# Press key combo
./scripts/key.sh "ctrl+s"
# Scroll down
./scripts/scroll.sh down 5
Actions Reference
| Action | Script | Arguments | Description |
|---|---|---|---|
| screenshot | screenshot.sh | — | Capture screen → base64 PNG |
| cursor_position | cursor_position.sh | — | Get current mouse X,Y |
| mouse_move | mouse_move.sh | x y | Move mouse to coordinates |
| left_click | click.sh | x y left | Left click at coordinates |
| right_click | click.sh | x y right | Right click |
| middle_click | click.sh | x y middle | Middle click |
| double_click | click.sh | x y double | Double click |
| triple_click | click.sh | x y triple | Triple click (select line) |
| left_click_drag | drag.sh | x1 y1 x2 y2 | Drag from start to end |
| left_mouse_down | mouse_down.sh | — | Press mouse button |
| left_mouse_up | mouse_up.sh | — | Release mouse button |
| type | type_text.sh | "text" | Type text (50 char chunks, 12ms delay) |
| key | key.sh | "combo" | Press key (Return, ctrl+c, alt+F4) |
| hold_key | hold_key.sh | "key" secs | Hold key for duration |
| scroll | scroll.sh | dir amt [x y] | Scroll up/down/left/right |
| wait | wait.sh | seconds | Wait then screenshot |
| zoom | zoom.sh | x1 y1 x2 y2 | Cropped region screenshot |
Workflow Pattern
- Screenshot — Always start by seeing the screen
- Analyze — Identify UI elements and coordinates
- Act — Click, type, scroll
- Screenshot — Verify result
- Repeat
Tips
- Screen is 1024x768, origin (0,0) at top-left
- Click to focus before typing in text fields
- Use
ctrl+Endto jump to page bottom in browsers - Most actions auto-screenshot after 2 sec delay
- Long text is chunked (50 chars) with 12ms keystroke delay
System Services
# Services auto-start on boot
sudo systemctl status virtual-desktop # Xvfb on :99
sudo systemctl status xfce-desktop # XFCE session
# Manual restart if needed
sudo systemctl restart virtual-desktop xfce-desktop
Opening Applications
export DISPLAY=:99
chromium-browser --no-sandbox & # Web browser
xfce4-terminal & # Terminal
thunar & # File manager
Requirements
System packages (install once):
sudo apt install -y xvfb xfce4 xfce4-terminal xdotool scrot imagemagick dbus-x11 chromium-browser