computer-use

Review·Scanned 2/17/2026

This skill provides desktop GUI automation for headless Linux using Xvfb/XFCE and xdotool. It includes explicit shell commands and scripts to run (e.g., export DISPLAY=:99, ./scripts/screenshot.sh, sudo apt install -y xvfb xfce4).

from clawhub.ai·v2402efd·9.6 KB·0 installs
Scanned from 1.0.0 at 2402efd · Transparency log ↗
$ vett add clawhub.ai/bodii88/computer-useReview findings below

Computer Use Skill

Full desktop GUI control for headless Linux servers. Creates a virtual display (Xvfb + XFCE) so you can run and control desktop applications on VPS/cloud instances without a physical monitor.

Environment

  • Display: :99
  • Resolution: 1024x768 (XGA, Anthropic recommended)
  • Desktop: XFCE4

Quick Start

export DISPLAY=:99

# Take screenshot
./scripts/screenshot.sh

# Click at coordinates
./scripts/click.sh 512 384 left

# Type text
./scripts/type_text.sh "Hello world"

# Press key combo
./scripts/key.sh "ctrl+s"

# Scroll down
./scripts/scroll.sh down 5

Actions Reference

ActionScriptArgumentsDescription
screenshotscreenshot.shCapture screen → base64 PNG
cursor_positioncursor_position.shGet current mouse X,Y
mouse_movemouse_move.shx yMove mouse to coordinates
left_clickclick.shx y leftLeft click at coordinates
right_clickclick.shx y rightRight click
middle_clickclick.shx y middleMiddle click
double_clickclick.shx y doubleDouble click
triple_clickclick.shx y tripleTriple click (select line)
left_click_dragdrag.shx1 y1 x2 y2Drag from start to end
left_mouse_downmouse_down.shPress mouse button
left_mouse_upmouse_up.shRelease mouse button
typetype_text.sh"text"Type text (50 char chunks, 12ms delay)
keykey.sh"combo"Press key (Return, ctrl+c, alt+F4)
hold_keyhold_key.sh"key" secsHold key for duration
scrollscroll.shdir amt [x y]Scroll up/down/left/right
waitwait.shsecondsWait then screenshot
zoomzoom.shx1 y1 x2 y2Cropped region screenshot

Workflow Pattern

  1. Screenshot — Always start by seeing the screen
  2. Analyze — Identify UI elements and coordinates
  3. Act — Click, type, scroll
  4. Screenshot — Verify result
  5. Repeat

Tips

  • Screen is 1024x768, origin (0,0) at top-left
  • Click to focus before typing in text fields
  • Use ctrl+End to jump to page bottom in browsers
  • Most actions auto-screenshot after 2 sec delay
  • Long text is chunked (50 chars) with 12ms keystroke delay

System Services

# Services auto-start on boot
sudo systemctl status virtual-desktop   # Xvfb on :99
sudo systemctl status xfce-desktop      # XFCE session

# Manual restart if needed
sudo systemctl restart virtual-desktop xfce-desktop

Opening Applications

export DISPLAY=:99
chromium-browser --no-sandbox &    # Web browser
xfce4-terminal &                   # Terminal
thunar &                           # File manager

Requirements

System packages (install once):

sudo apt install -y xvfb xfce4 xfce4-terminal xdotool scrot imagemagick dbus-x11 chromium-browser