Назад към всички

rag-chunking-strategy

// Document chunking with multiple strategies including semantic, recursive, and fixed-size chunking

$ git log --oneline --stat
stars:384
forks:73
updated:March 4, 2026
SKILL.mdreadonly
SKILL.md Frontmatter
namerag-chunking-strategy
descriptionDocument chunking with multiple strategies including semantic, recursive, and fixed-size chunking
allowed-toolsRead,Write,Edit,Bash,Glob,Grep

RAG Chunking Strategy Skill

Capabilities

  • Implement multiple document chunking strategies
  • Configure semantic chunking based on content boundaries
  • Set up recursive character text splitting
  • Design fixed-size chunking with overlap
  • Implement document-aware chunking (markdown, code, etc.)
  • Optimize chunk sizes for retrieval quality

Target Processes

  • rag-pipeline-implementation
  • chunking-strategy-design

Implementation Details

Chunking Strategies

  1. RecursiveCharacterTextSplitter: Hierarchical splitting with separators
  2. SemanticChunker: Embedding-based semantic boundaries
  3. TokenTextSplitter: Token-aware splitting
  4. MarkdownHeaderTextSplitter: Structure-aware markdown splitting
  5. CodeSplitter: Language-aware code chunking

Configuration Options

  • Chunk size (characters or tokens)
  • Chunk overlap percentage
  • Separator hierarchy
  • Embedding model for semantic chunking
  • Document type detection

Best Practices

  • Match chunk size to embedding model limits
  • Use appropriate overlap for context preservation
  • Test retrieval quality with different strategies
  • Consider document structure in strategy selection

Dependencies

  • langchain-text-splitters
  • sentence-transformers (for semantic chunking)