pdf-text-extractor
✓Download PDFs (when available) and extract plain text to support full-text evidence, writing `papers/fulltext_index.jsonl` and `papers/fulltext/*.txt`. **Trigger**: PDF download, fulltext, extract text, papers/pdfs, 全文抽取, 下载PDF. **Use when**: `queries.md` 设置 `evidence_mode: fulltext`(或你明确需要全文证据)并希望为 paper notes/claims 提供更强 evidence。 **Skip if**: `evidence_mode: abstract`(默认);或你不希望进行下载/抽取(成本/权限/时间)。 **Network**: fulltext 下载通常需要网络(除非你手工提供 PDF 缓存在 `papers/pdfs/`)。 **Guardrail**: 缓存下载到 `papers/pdfs/`;默认不覆盖已有抽取文本(除非显式要求重抽)。
Installation
SKILL.md
Optionally collect full-text snippets to deepen evidence beyond abstracts.
This skill is intentionally conservative: in many survey runs, abstract/snippet mode is enough and avoids heavy downloads.
When you cannot/should not download PDFs (restricted network, rate limits, no permission), provide PDFs manually and run in “local PDFs only” mode.
Download PDFs (when available) and extract plain text to support full-text evidence, writing `papers/fulltext_index.jsonl` and `papers/fulltext/*.txt`. **Trigger**: PDF download, fulltext, extract text, papers/pdfs, 全文抽取, 下载PDF. **Use when**: `queries.md` 设置 `evidence_mode: fulltext`(或你明确需要全文证据)并希望为 paper notes/claims 提供更强 evidence。 **Skip if**: `evidence_mode: abstract`(默认);或你不希望进行下载/抽取(成本/权限/时间)。 **Network**: fulltext 下载通常需要网络(除非你手工提供 PDF 缓存在 `papers/pdfs/`)。 **Guardrail**: 缓存下载到 `papers/pdfs/`;默认不覆盖已有抽取文本(除非显式要求重抽)。 Source: willoscar/research-units-pipeline-skills.
Facts (cite-ready)
Stable fields and commands for AI/search citations.
- Install command
npx skills add https://github.com/willoscar/research-units-pipeline-skills --skill pdf-text-extractor- Category
- #Documents
- Verified
- ✓
- First Seen
- 2026-02-01
- Updated
- 2026-02-18
Quick answers
What is pdf-text-extractor?
Download PDFs (when available) and extract plain text to support full-text evidence, writing `papers/fulltext_index.jsonl` and `papers/fulltext/*.txt`. **Trigger**: PDF download, fulltext, extract text, papers/pdfs, 全文抽取, 下载PDF. **Use when**: `queries.md` 设置 `evidence_mode: fulltext`(或你明确需要全文证据)并希望为 paper notes/claims 提供更强 evidence。 **Skip if**: `evidence_mode: abstract`(默认);或你不希望进行下载/抽取(成本/权限/时间)。 **Network**: fulltext 下载通常需要网络(除非你手工提供 PDF 缓存在 `papers/pdfs/`)。 **Guardrail**: 缓存下载到 `papers/pdfs/`;默认不覆盖已有抽取文本(除非显式要求重抽)。 Source: willoscar/research-units-pipeline-skills.
How do I install pdf-text-extractor?
Open your terminal or command line tool (Terminal, iTerm, Windows Terminal, etc.) Copy and run this command: npx skills add https://github.com/willoscar/research-units-pipeline-skills --skill pdf-text-extractor Once installed, the skill will be automatically configured in your AI coding environment and ready to use in Claude Code or Cursor
Where is the source repository?
https://github.com/willoscar/research-units-pipeline-skills
Details
- Category
- #Documents
- Source
- skills.sh
- First Seen
- 2026-02-01