vision-sandbox
Agentic Vision via Gemini's native Code Execution sandbox. Use for spatial grounding, visual math, and UI auditing.
Vision Sandbox 🔭
Agentic Vision via Gemini's native Python code execution sandbox.
Instead of just "guessing" what's in an image, the model can write and execute code to verify spatial relationships, count objects, or perform complex visual reasoning with pixel-level precision.
🚀 Primary Use Cases
Designed as a core skill for OpenClaw, Vision Sandbox provides visual grounding for agentic workflows:
- Spatial Grounding: Get precise [x, y] coordinates for UI elements.
- Visual Calculation: Let the model use Python to calculate values from visual data.
- UI Auditing: Automatically check for overlaps, alignment, and accessibility.
🛠 Prerequisites
- uv (Python package manager)
- Python 3.11 (Locked for stability)
GEMINI_API_KEYset in your environment.
📦 Installation
Via ClawHub (Recommended)
clawhub install vision-sandbox
For Local Development
git clone https://github.com/johanesalxd/vision-sandbox.git
cd vision-sandbox
uv sync
📖 Quick Start
Run a vision task using the CLI:
uv run vision-sandbox --image "sample/how-many-fingers.png" --prompt "Count the fingers."
Example: Visual Reasoning
uv run vision-sandbox --image "sample/how-many-fingers.png" --prompt "Count the number of fingers on this hand. Use code execution to identify the bounding box for each finger and return the total count."
Result: The model writes Python code to define bounding boxes for each digit, ensuring an accurate count rather than a visual guess.
🤖 OpenCode Integration
Vision Sandbox is a powerful companion for OpenCode.
Installation for OpenCode
-
Global Installation: Copy
SKILL.mdto your global OpenCode skills directory:mkdir -p ~/.config/opencode/skills/vision-sandbox cp SKILL.md ~/.config/opencode/skills/vision-sandbox/SKILL.md -
Project Installation: If you want the skill available only for a specific project:
mkdir -p .opencode/skills/vision-sandbox cp SKILL.md .opencode/skills/vision-sandbox/SKILL.md
Example Interaction
"Hey OpenCode, run the
vision-sandboxskill on this screenshot to find the exact padding of the login card, then updatestyles.cssaccordingly."
🧑💻 Development
Linting & Formatting
This project uses ruff for code quality.
uv run ruff format .
uv run ruff check --fix .
Running Tests
uv run pytest
📜 License
MIT