·ai-vision
*

ai-vision

Multimodal UI understanding and single-step planning via OpenAI-compatible Responses APIs. Use when you need AIQuery/AIAssert and plan-next to extract UI element coordinates, validate UI assertions, summarize screenshots, or decide the next UI action from an image. External agents handle execution via adb/hdc and multi-step loops. Defaults to Doubao models but can be pointed at other multimodal providers via base URL, API key, and model name.

13Installs·0Trend·@httprunner

Installation

$npx skills add https://github.com/httprunner/skills --skill ai-vision

How to Install ai-vision

Quickly install ai-vision AI skill to your development environment via command line

  1. Open Terminal: Open your terminal or command line tool (Terminal, iTerm, Windows Terminal, etc.)
  2. Run Installation Command: Copy and run this command: npx skills add https://github.com/httprunner/skills --skill ai-vision
  3. Verify Installation: Once installed, the skill will be automatically configured in your AI coding environment and ready to use in Claude Code, Cursor, or OpenClaw

Source: httprunner/skills.

SKILL.md

View raw

This skill provides a standalone CLI to call multimodal models for UI querying, assertion, and single-step planning. It does not depend on device type; you supply a screenshot and receive structured output (coordinates, decisions, or next actions). Execution and multi-step loops are handled externally by agents using adb/hdc or other drivers. Prefer storing screenshots in /.eval/screenshots/ and add timestamps to...

Canonical install and execution directory: /.agents/skills/ai-vision/. Run commands from this directory:

Model Configuration Default Doubao configuration via environment variables:

Multimodal UI understanding and single-step planning via OpenAI-compatible Responses APIs. Use when you need AIQuery/AIAssert and plan-next to extract UI element coordinates, validate UI assertions, summarize screenshots, or decide the next UI action from an image. External agents handle execution via adb/hdc and multi-step loops. Defaults to Doubao models but can be pointed at other multimodal providers via base URL, API key, and model name. Source: httprunner/skills.

Facts (cite-ready)

Stable fields and commands for AI/search citations.

Install command
npx skills add https://github.com/httprunner/skills --skill ai-vision
Category
*Creative Media
Verified
First Seen
2026-02-22
Updated
2026-03-10

Browse more skills from httprunner/skills

Quick answers

What is ai-vision?

Multimodal UI understanding and single-step planning via OpenAI-compatible Responses APIs. Use when you need AIQuery/AIAssert and plan-next to extract UI element coordinates, validate UI assertions, summarize screenshots, or decide the next UI action from an image. External agents handle execution via adb/hdc and multi-step loops. Defaults to Doubao models but can be pointed at other multimodal providers via base URL, API key, and model name. Source: httprunner/skills.

How do I install ai-vision?

Open your terminal or command line tool (Terminal, iTerm, Windows Terminal, etc.) Copy and run this command: npx skills add https://github.com/httprunner/skills --skill ai-vision Once installed, the skill will be automatically configured in your AI coding environment and ready to use in Claude Code, Cursor, or OpenClaw

Where is the source repository?

https://github.com/httprunner/skills