·llama-cpp

Secondary local LLM inference engine via llama.cpp. This skill should be used when running GGUF models directly, loading LoRA adapters for Kothar, benchmarking inference speed, or serving models via llama-server. Complements Ollama (which remains primary for RLAMA and general use).

24Installs·1Trend·@tdimino

Installation

$npx skills add https://github.com/tdimino/claude-code-minoan --skill llama-cpp

How to Install llama-cpp

Quickly install llama-cpp AI skill to your development environment via command line

  1. Open Terminal: Open your terminal or command line tool (Terminal, iTerm, Windows Terminal, etc.)
  2. Run Installation Command: Copy and run this command: npx skills add https://github.com/tdimino/claude-code-minoan --skill llama-cpp
  3. Verify Installation: Once installed, the skill will be automatically configured in your AI coding environment and ready to use in Claude Code, Cursor, or OpenClaw

Source: tdimino/claude-code-minoan.

SKILL.md

View raw

Direct access to llama.cpp for faster inference, LoRA adapter loading, and benchmarking on Apple Silicon. Ollama remains primary for RLAMA and general use; llama.cpp is the power tool.

To avoid duplicating model files, resolve an Ollama model name to its GGUF blob path:

To start an OpenAI-compatible server (port 8081, avoids Ollama's 11434):

Secondary local LLM inference engine via llama.cpp. This skill should be used when running GGUF models directly, loading LoRA adapters for Kothar, benchmarking inference speed, or serving models via llama-server. Complements Ollama (which remains primary for RLAMA and general use). Source: tdimino/claude-code-minoan.

Facts (cite-ready)

Stable fields and commands for AI/search citations.

Install command
npx skills add https://github.com/tdimino/claude-code-minoan --skill llama-cpp
Category
</>Dev Tools
Verified
First Seen
2026-03-01
Updated
2026-03-10

Browse more skills from tdimino/claude-code-minoan

Quick answers

What is llama-cpp?

Secondary local LLM inference engine via llama.cpp. This skill should be used when running GGUF models directly, loading LoRA adapters for Kothar, benchmarking inference speed, or serving models via llama-server. Complements Ollama (which remains primary for RLAMA and general use). Source: tdimino/claude-code-minoan.

How do I install llama-cpp?

Open your terminal or command line tool (Terminal, iTerm, Windows Terminal, etc.) Copy and run this command: npx skills add https://github.com/tdimino/claude-code-minoan --skill llama-cpp Once installed, the skill will be automatically configured in your AI coding environment and ready to use in Claude Code, Cursor, or OpenClaw

Where is the source repository?

https://github.com/tdimino/claude-code-minoan