·browser automation
*

browser automation

Vision-driven browser automation using Midscene. Operates entirely from screenshots — no DOM or accessibility labels required. Can interact with all visible elements on screen regardless of technology stack. Opens a new browser tab for each target URL via Puppeteer (headless Chrome). Use this skill when the user wants to: - Browse, navigate, or open web pages - Scrape, extract, or collect data from websites - Fill out forms, click buttons, or interact with web elements - Verify, validate, or test frontend UI behavior - Take screenshots of web pages - Automate multi-step web workflows - Run browser automation or check website content Powered by Midscene.js (https://midscenejs.com)

454Installs·53Trend·@web-infra-dev

Installation

$npx skills add https://github.com/web-infra-dev/midscene-skills --skill browser automation

How to Install browser automation

Quickly install browser automation AI skill to your development environment via command line

  1. Open Terminal: Open your terminal or command line tool (Terminal, iTerm, Windows Terminal, etc.)
  2. Run Installation Command: Copy and run this command: npx skills add https://github.com/web-infra-dev/midscene-skills --skill browser automation
  3. Verify Installation: Once installed, the skill will be automatically configured in your AI coding environment and ready to use in Claude Code, Cursor, or OpenClaw

Source: web-infra-dev/midscene-skills.

SKILL.md

View raw

CRITICAL RULES — VIOLATIONS WILL BREAK THE WORKFLOW: Never run midscene commands in the background. Each command must run synchronously so you can read its output (especially screenshots) before deciding the next action. Background execution breaks the screenshot-analyze-act loop.

Run only one midscene command at a time. Wait for the previous command to finish, read the screenshot, then decide the next action. Never chain multiple commands together.

Allow enough time for each command to complete. Midscene commands involve AI inference and screen interaction, which can take longer than typical shell commands. A typical command needs about 1 minute; complex act commands may need even longer.

Vision-driven browser automation using Midscene. Operates entirely from screenshots — no DOM or accessibility labels required. Can interact with all visible elements on screen regardless of technology stack. Opens a new browser tab for each target URL via Puppeteer (headless Chrome). Use this skill when the user wants to: - Browse, navigate, or open web pages - Scrape, extract, or collect data from websites - Fill out forms, click buttons, or interact with web elements - Verify, validate, or test frontend UI behavior - Take screenshots of web pages - Automate multi-step web workflows - Run browser automation or check website content Powered by Midscene.js (https://midscenejs.com) Source: web-infra-dev/midscene-skills.

Facts (cite-ready)

Stable fields and commands for AI/search citations.

Install command
npx skills add https://github.com/web-infra-dev/midscene-skills --skill browser automation
Category
*Creative Media
Verified
First Seen
2026-03-07
Updated
2026-03-10

Browse more skills from web-infra-dev/midscene-skills

Quick answers

What is browser automation?

Vision-driven browser automation using Midscene. Operates entirely from screenshots — no DOM or accessibility labels required. Can interact with all visible elements on screen regardless of technology stack. Opens a new browser tab for each target URL via Puppeteer (headless Chrome). Use this skill when the user wants to: - Browse, navigate, or open web pages - Scrape, extract, or collect data from websites - Fill out forms, click buttons, or interact with web elements - Verify, validate, or test frontend UI behavior - Take screenshots of web pages - Automate multi-step web workflows - Run browser automation or check website content Powered by Midscene.js (https://midscenejs.com) Source: web-infra-dev/midscene-skills.

How do I install browser automation?

Open your terminal or command line tool (Terminal, iTerm, Windows Terminal, etc.) Copy and run this command: npx skills add https://github.com/web-infra-dev/midscene-skills --skill browser automation Once installed, the skill will be automatically configured in your AI coding environment and ready to use in Claude Code, Cursor, or OpenClaw

Where is the source repository?

https://github.com/web-infra-dev/midscene-skills

Details

Category
*Creative Media
Source
skills.sh
First Seen
2026-03-07