·nemo-curator

GPU-accelerated data curation for LLM training. Supports text/image/video/audio. Features fuzzy deduplication (16× faster), quality filtering (30+ heuristics), semantic deduplication, PII redaction, NSFW detection. Scales across GPUs with RAPIDS. Use for preparing high-quality training datasets, cleaning web data, or deduplicating large corpora.

16Installs·1Trend·@orchestra-research

Installation

$npx skills add https://github.com/orchestra-research/ai-research-skills --skill nemo-curator

SKILL.md

| Operation | CPU (16 cores) | GPU (A100) | Speedup |

| Fuzzy dedup (8TB) | 120 hours | 7.5 hours | 16× | | Exact dedup (1TB) | 8 hours | 0.5 hours | 16× | | Quality filtering | 2 hours | 0.2 hours | 10× |

GPU-accelerated data curation for LLM training. Supports text/image/video/audio. Features fuzzy deduplication (16× faster), quality filtering (30+ heuristics), semantic deduplication, PII redaction, NSFW detection. Scales across GPUs with RAPIDS. Use for preparing high-quality training datasets, cleaning web data, or deduplicating large corpora. Source: orchestra-research/ai-research-skills.

Open your terminal or command line tool (Terminal, iTerm, Windows Terminal, etc.) Copy and run this command: npx skills add https://github.com/orchestra-research/ai-research-skills --skill nemo-curator Once installed, the skill will be automatically configured in your AI coding environment and ready to use in Claude Code or Cursor

View raw

Facts (cite-ready)

Stable fields and commands for AI/search citations.

Install command
npx skills add https://github.com/orchestra-research/ai-research-skills --skill nemo-curator
Category
*Creative Media
Verified
First Seen
2026-02-11
Updated
2026-02-18

Quick answers

What is nemo-curator?

GPU-accelerated data curation for LLM training. Supports text/image/video/audio. Features fuzzy deduplication (16× faster), quality filtering (30+ heuristics), semantic deduplication, PII redaction, NSFW detection. Scales across GPUs with RAPIDS. Use for preparing high-quality training datasets, cleaning web data, or deduplicating large corpora. Source: orchestra-research/ai-research-skills.

How do I install nemo-curator?

Open your terminal or command line tool (Terminal, iTerm, Windows Terminal, etc.) Copy and run this command: npx skills add https://github.com/orchestra-research/ai-research-skills --skill nemo-curator Once installed, the skill will be automatically configured in your AI coding environment and ready to use in Claude Code or Cursor

Where is the source repository?

https://github.com/orchestra-research/ai-research-skills

Details

Category
*Creative Media
Source
skills.sh
First Seen
2026-02-11