multimodal-llm
✓Vision, audio, and multimodal LLM integration patterns. Use when processing images, transcribing audio, generating speech, or building multimodal AI pipelines.
Installation
SKILL.md
Integrate vision and audio capabilities from leading multimodal models. Covers image analysis, document understanding, real-time voice agents, speech-to-text, and text-to-speech.
| Category | Rules | Impact | When to Use |
| Vision: Image Analysis | 1 | HIGH | Image captioning, VQA, multi-image comparison, object detection | | Vision: Document Understanding | 1 | HIGH | OCR, chart/diagram analysis, PDF processing, table extraction | | Vision: Model Selection | 1 | MEDIUM | Choosing provider, cost optimization, image size limits |
Vision, audio, and multimodal LLM integration patterns. Use when processing images, transcribing audio, generating speech, or building multimodal AI pipelines. Source: yonatangross/orchestkit.
Facts (cite-ready)
Stable fields and commands for AI/search citations.
- Install command
npx skills add https://github.com/yonatangross/orchestkit --skill multimodal-llm- Source
- yonatangross/orchestkit
- Category
- </>Dev Tools
- Verified
- ✓
- First Seen
- 2026-02-17
- Updated
- 2026-02-18
Quick answers
What is multimodal-llm?
Vision, audio, and multimodal LLM integration patterns. Use when processing images, transcribing audio, generating speech, or building multimodal AI pipelines. Source: yonatangross/orchestkit.
How do I install multimodal-llm?
Open your terminal or command line tool (Terminal, iTerm, Windows Terminal, etc.) Copy and run this command: npx skills add https://github.com/yonatangross/orchestkit --skill multimodal-llm Once installed, the skill will be automatically configured in your AI coding environment and ready to use in Claude Code or Cursor
Where is the source repository?
https://github.com/yonatangross/orchestkit
Details
- Category
- </>Dev Tools
- Source
- skills.sh
- First Seen
- 2026-02-17