pdf-reader
// Extract text, search inside PDFs, and produce summaries.
$ git log --oneline --stat
stars:1,933
forks:367
updated:March 4, 2026
SKILL.mdreadonly
SKILL.md Frontmatter
namepdf-reader
descriptionExtract text, search inside PDFs, and produce summaries.
homepagehttps://pymupdf.readthedocs.io
metadata[object Object]
PDF Reader Skill
The pdf-reader skill provides functionality to extract text and retrieve metadata from PDF files using PyMuPDF (fitz).
Tool API
The skill provides two commands:
extract
Extracts plain text from the specified PDF file.
- Parameters:
file_path(string, required): Path to the PDF file to extract text from.--max_pages(integer, optional): Maximum number of pages to extract.
Usage:
python3 skills/pdf-reader/reader.py extract /path/to/document.pdf
python3 skills/pdf-reader/reader.py extract /path/to/document.pdf --max_pages 5
Output: Plain text content from the PDF.
metadata
Retrieve metadata about the document.
- Parameters:
file_path(string, required): Path to the PDF file.
Usage:
python3 skills/pdf-reader/reader.py metadata /path/to/document.pdf
Output: JSON object with PDF metadata including:
title: Document titleauthor: Document authorsubject: Document subjectcreator: Application that created the PDFproducer: PDF producercreationDate: Creation datemodDate: Modification dateformat: PDF format versionencryption: Encryption info (if any)
Implementation Notes
- Uses PyMuPDF (imported as
pymupdf) for fast, reliable PDF processing - Supports encrypted PDFs (will return error if password required)
- Handles large PDFs efficiently with
max_pagesoption - Returns structured JSON for metadata command
Example
# Extract text from first 3 pages
python3 skills/pdf-reader/reader.py extract report.pdf --max_pages 3
# Get document metadata
python3 skills/pdf-reader/reader.py metadata report.pdf
# Output:
# {
# "title": "Annual Report 2024",
# "author": "John Doe",
# "creationDate": "D:20240115120000",
# ...
# }
Error Handling
- Returns error message if file not found or not a valid PDF
- Returns error if PDF is encrypted and requires password
- Gracefully handles corrupted or malformed PDFs