building-github-index

// Generate progressive disclosure indexes for GitHub repositories to use as Claude project knowledge. Use when setting up projects referencing external documentation, creating searchable indexes of technical blogs or knowledge bases, combining multiple repos into one index, or when user mentions "inde

$ git log --oneline --stat

stars:108

forks:21

updated:March 4, 2026

SKILL.mdreadonly

SKILL.md Frontmatter

namebuilding-github-index

descriptionGenerate progressive disclosure indexes for GitHub repositories to use as Claude project knowledge. Use when setting up projects referencing external documentation, creating searchable indexes of technical blogs or knowledge bases, combining multiple repos into one index, or when user mentions "index", "github repo", "project knowledge", or "documentation reference".

metadata[object Object]

Building GitHub Index

Create markdown indexes of GitHub repositories optimized for Claude project knowledge. Indexes enable retrieval via GitHub API with semantic descriptions for effective matching.

Quick Start

# Documentation repos (markdown/notebooks)
python scripts/github_index.py owner/repo -o index.md

# Code repos (extract symbols via tree-sitter)
python scripts/github_index.py owner/repo --code-symbols -o index.md

# Multiple repos combined
python scripts/github_index.py owner/repo1 owner/repo2 -o combined.md

Script Options

Flag	Description
`-o, --output`	Output file (default: `github_index.md`)
`--token`	GitHub PAT; also reads `GITHUB_TOKEN` env
`--include-patterns`	Only index matching globs: `"docs/" "src/"`
`--exclude-patterns`	Skip matching globs: `"test/**"`
`--max-files`	Cap files per repo (default: 200)
`--skip-fetch`	Tree only, no content fetch (fast, filename-only descriptions)
`--code-symbols`	Include code files, extract function/class names via tree-sitter

Description Extraction Priority

YAML frontmatter - title: and description: fields
Markdown headings - First h1/h2 as title, subsequent as topics
Notebook cells - First markdown cell heading
Code symbols - Public function/class names (with --code-symbols)
Path-derived - Convert filename to words (fallback)

When Descriptions Fail

Some repos have stub files (links to external docs, empty readmes). In these cases:

Manual curation recommended. Use the tree output and domain knowledge:

# Get tree structure only (fast)
python scripts/github_index.py owner/repo --skip-fetch -o skeleton.md
# Then manually enhance descriptions based on domain knowledge

For code-heavy repos with embedded apps:

Directory names encode purpose: acc_wav_gen → "ACC waveform generation"
Peripheral acronyms map to functions: AFEC=ADC, MCAN=CAN, TWIHS=I2C
Operation modes: blocking, interrupt, dma, polled

Output Format

# {Repo} - Content Index

**Repository:** {url}
**Branch:** `{branch}`

## Retrieval Method
{API curl commands}

---

## {Category}

| Description | Path |
|-------------|------|
| {What this covers} | `{path/file.md}` |

Description column leads (relevance matching), path follows (retrieval key).

API Access

Enumerate files:

curl -sL "https://api.github.com/repos/OWNER/REPO/git/trees/BRANCH?recursive=1"

Fetch content:

curl -s "https://api.github.com/repos/OWNER/REPO/contents/PATH?ref=BRANCH" \
  -H "Accept: application/vnd.github+json" | \
  python3 -c "import sys,json,base64; print(base64.b64decode(json.load(sys.stdin)['content']).decode())"

Network

Allowlist: api.github.com, raw.githubusercontent.com

Related Skills

accessing-github-repos - Private repos, PAT setup, tarball download
mapping-codebases - Detailed code structure (methods, imports, line numbers)

Condensed Format (pk_index.py)

For token-constrained project knowledge, use the condensed script:

python scripts/pk_index.py owner/repo -o repo_pk.md

Produces ~80% smaller output:

Single line per file: path — description
Symbols only (no signatures)
15 files max per category
No retrieval instructions section

Ideal when adding multiple repo indexes to project knowledge.