playwright-scraper-skill
High-risk skill: documentation and scripts instruct executing shell commands (node scripts/playwright-stealth.js, npm install) and running included scripts that accept user-supplied URLs. It also performs external network requests (e.g., https://m.discuss.com.hk/#hot, https://example.com) and writes screenshots/HTML (/tmp/discuss-hk.png, screenshot-*.png).
Playwright Scraper Skill 🕷️
中文文檔 | English
A Playwright-based web scraping OpenClaw Skill with anti-bot protection. Successfully tested on complex websites like Discuss.com.hk.
📦 Installation: See INSTALL.md
📚 Full Documentation: See SKILL.md
💡 Examples: See examples/README.md
✨ Features
- ✅ Pure Playwright — Modern, powerful, easy to use
- ✅ Anti-Bot Protection — Hides automation, realistic UA
- ✅ Verified — 100% success on Discuss.com.hk
- ✅ Simple to Use — One-line commands
- ✅ Customizable — Environment variable support
🚀 Quick Start
Installation
npm install
npx playwright install chromium
Usage
# Quick scraping
node scripts/playwright-simple.js https://example.com
# Stealth mode (recommended)
node scripts/playwright-stealth.js "https://m.discuss.com.hk/#hot"
📖 Two Modes
| Mode | Use Case | Speed | Anti-Bot |
|---|---|---|---|
| Simple | Regular dynamic sites | Fast (3-5s) | None |
| Stealth ⭐ | Sites with anti-bot | Medium (5-20s) | Medium-High |
Simple Mode
For sites without anti-bot protection:
node scripts/playwright-simple.js <URL>
Stealth Mode (Recommended)
For sites with Cloudflare or anti-bot protection:
node scripts/playwright-stealth.js <URL>
Anti-Bot Techniques:
- Hide
navigator.webdriver - Realistic User-Agent (iPhone)
- Human-like behavior simulation
- Screenshot and HTML saving support
🎯 Customization
All scripts support environment variables:
# Show browser
HEADLESS=false node scripts/playwright-stealth.js <URL>
# Custom wait time (milliseconds)
WAIT_TIME=10000 node scripts/playwright-stealth.js <URL>
# Save screenshot
SCREENSHOT_PATH=/tmp/page.png node scripts/playwright-stealth.js <URL>
# Save HTML
SAVE_HTML=true node scripts/playwright-stealth.js <URL>
# Custom User-Agent
USER_AGENT="Mozilla/5.0 ..." node scripts/playwright-stealth.js <URL>
📊 Test Results
| Website | Result | Time |
|---|---|---|
| Discuss.com.hk | ✅ 200 OK | 5-20s |
| Example.com | ✅ 200 OK | 3-5s |
| Cloudflare Protected | ✅ Mostly successful | 10-30s |
📁 File Structure
playwright-scraper-skill/
├── scripts/
│ ├── playwright-simple.js # Simple mode
│ └── playwright-stealth.js # Stealth mode ⭐
├── examples/
│ ├── discuss-hk.sh # Discuss.com.hk example
│ └── README.md # More examples
├── SKILL.md # Full documentation
├── INSTALL.md # Installation guide
├── README.md # This file
├── README_ZH.md # Chinese documentation
├── CONTRIBUTING.md # Contribution guide
├── CHANGELOG.md # Version history
└── package.json # npm config
💡 Best Practices
- Try web_fetch first — OpenClaw's built-in tool is fastest
- Use Simple for dynamic sites — When no anti-bot protection
- Use Stealth for protected sites ⭐ — Main workhorse
- Use specialized skills — For YouTube, Reddit, etc.
🐛 Troubleshooting
Getting 403 blocked?
Use Stealth mode:
node scripts/playwright-stealth.js <URL>
Cloudflare challenge?
Increase wait time + headful mode:
HEADLESS=false WAIT_TIME=30000 node scripts/playwright-stealth.js <URL>
Playwright not found?
Reinstall:
npm install
npx playwright install chromium
More issues? See INSTALL.md
🤝 Contributing
Contributions welcome! See CONTRIBUTING.md
📄 License
MIT License - See LICENSE
🔗 Links
- Playwright Official Docs
- Full Documentation (SKILL.md)
- Installation Guide (INSTALL.md)
- Examples (examples/)