audio-language-models
✓Gemini Live API, Grok Voice Agent, GPT-4o-Transcribe, AssemblyAI patterns for real-time voice, speech-to-text, and TTS. Use when implementing voice agents, audio transcription, or conversational AI.
Installation
SKILL.md
Build real-time voice agents and audio processing using the latest native speech-to-speech models.
| Model | Latency | Languages | Price | Best For |
| Grok Voice Agent | <1s TTFA | 100+ | $0.05/min | Fastest, #1 Big Bench | | Gemini Live API | Low | 24 (30 voices) | Usage-based | Emotional awareness | | OpenAI Realtime | 1s | 50+ | $0.10/min | Ecosystem integration |
Gemini Live API, Grok Voice Agent, GPT-4o-Transcribe, AssemblyAI patterns for real-time voice, speech-to-text, and TTS. Use when implementing voice agents, audio transcription, or conversational AI. Source: yonatangross/orchestkit.
Facts (cite-ready)
Stable fields and commands for AI/search citations.
- Install command
npx skills add https://github.com/yonatangross/orchestkit --skill audio-language-models- Source
- yonatangross/orchestkit
- Category
- *Creative Media
- Verified
- ✓
- First Seen
- 2026-02-01
- Updated
- 2026-02-18
Quick answers
What is audio-language-models?
Gemini Live API, Grok Voice Agent, GPT-4o-Transcribe, AssemblyAI patterns for real-time voice, speech-to-text, and TTS. Use when implementing voice agents, audio transcription, or conversational AI. Source: yonatangross/orchestkit.
How do I install audio-language-models?
Open your terminal or command line tool (Terminal, iTerm, Windows Terminal, etc.) Copy and run this command: npx skills add https://github.com/yonatangross/orchestkit --skill audio-language-models Once installed, the skill will be automatically configured in your AI coding environment and ready to use in Claude Code or Cursor
Where is the source repository?
https://github.com/yonatangross/orchestkit
Details
- Category
- *Creative Media
- Source
- skills.sh
- First Seen
- 2026-02-01