rlm
// Recursive Language Model pattern for processing large inputs using Sonnet orchestrator and parallel Haiku subagents. Decomposes complex tasks into chunks, processes them in parallel via sub-agents, then synthesizes results. Use when: analyze large codebase, recursive analysis, deep analysis, process
Recursive Language Model (RLM)
Process large inputs that exceed single-pass capacity by decomposing tasks into manageable chunks, spawning parallel Haiku sub-agents to process each chunk, and synthesizing results into a unified output.
The RLM pattern implements a two-level supervisor-worker hierarchy: a Sonnet supervisor handles strategic decomposition and final synthesis, while Haiku workers handle focused chunk processing in parallel. This achieves cost reduction (Haiku processes 90%+ of tokens) and speed improvement (parallel execution) while maintaining quality through Sonnet-level synthesis.
Prerequisites
- Task tool access: Required for spawning Haiku sub-agents
- Scratchpad directory: Use the session scratchpad for intermediate results
- Model selection: Sonnet for supervisor (decomposition + synthesis), Haiku for workers (chunk processing)
Workflow Overview
Input -> [Step 1: Assess] -> [Step 2: Decompose] -> [Step 3: Spawn Workers]
|
[Parallel Haiku]
|
[Step 4: Evaluate]
/ \
Gaps found? Complete?
| |
Re-decompose [Step 5: Synthesize]
(new iteration) |
Output
Instructions
Step 1: Assess Task Complexity
Determine whether RLM processing is needed based on input size:
| Metric | Threshold | Action |
|---|---|---|
| File count | > 50 files | Use RLM |
| Total lines | > 10,000 lines | Use RLM |
| Estimated tokens | > 100,000 tokens | Use RLM |
| File count | <= 50 files | Process directly without RLM |
| Total lines | <= 10,000 lines | Process directly without RLM |
| Estimated tokens | <= 100,000 tokens | Process directly without RLM |
- Measure input size: Use Glob to count files, estimate line counts and token volume.
- Compare against thresholds: If ANY threshold is exceeded, use RLM.
- Announce decision: Tell the user: "This task exceeds single-pass capacity. Using RLM pattern to decompose, process in parallel, and synthesize."
- If below thresholds: Process the input directly with standard tools. Do NOT use RLM for small inputs.
Step 2: Decompose the Problem
Analyze the input and create an explicit decomposition plan using one of three strategies.
Strategy 1: Uniform Chunking
Split input into equally-sized chunks by line count.
When to use: No obvious structure, general queries, homogeneous content (logs, transcripts, flat text).
Procedure:
- Calculate total line count.
- Split into chunks of ~200 lines each.
- Add 5-10 lines of overlap between adjacent chunks to avoid boundary artifacts.
- Assign the same query to each chunk.
Strategy 2: Keyword Filtering
Use Grep to narrow the input to relevant sections before chunking.
When to use: Targeted queries where only a fraction of the input is relevant (e.g., "find all authentication mentions in this codebase").
Procedure:
- Extract keywords from the user query.
- Use Grep to find all matching sections with surrounding context.
- If filtered result is small enough (< 10,000 lines), process as a single chunk.
- If filtered result is still large, apply uniform chunking to the filtered content.
Strategy 3: Structural Decomposition
Parse the input by its natural structure (sections, functions, modules, headings).
When to use: Structured content like markdown documents (split by headings), codebases (split by module/directory), or multi-file projects (split by functional area).
Procedure:
- Identify structural boundaries: markdown headings, directory boundaries, class/function definitions.
- Group related structural units into chunks (e.g., 5-10 related files per chunk).
- Label each chunk with its structural context (section title, module name).
- Tailor the worker query to each chunk's context.
Choosing a Strategy
| Input Type | Query Type | Recommended Strategy |
|---|---|---|
| Flat text (logs, transcripts) | General summary | Uniform Chunking |
| Any content type | Targeted search for specific topic | Keyword Filtering |
| Markdown documents | Section-by-section analysis | Structural Decomposition |
| Codebase (multi-file) | Module-level review | Structural Decomposition |
| Codebase (multi-file) | Find specific pattern | Keyword Filtering |
| Large document | Comprehensive review | Structural Decomposition |
Chunk Size Guidance
Target ~200 lines or ~8,000 tokens per chunk with 5-10 lines overlap between adjacent chunks. Adjust based on content density:
- Dense code: smaller chunks (~150 lines)
- Prose text: larger chunks (~300 lines)
- Mixed content: default (~200 lines)
Create and Save Work Plan
Write the decomposition plan to the scratchpad:
# RLM Work Plan
Query: [user's original question]
Strategy: [uniform | keyword | structural]
Total chunks: [N]
## Chunks
- Chunk 1: [description] - Focus: [specific aspect]
- Chunk 2: [description] - Focus: [specific aspect]
...
Save to: scratchpad/rlm-work-plan.md
Step 3: Spawn Worker Sub-Agents
Execute parallel chunk processing using the Task tool with Haiku model.
-
Build worker prompts: Each worker receives:
- The chunk content (or file paths to read)
- The specific question to answer for this chunk
- The output format requirements
- A constraint to focus ONLY on the assigned chunk
-
Invoke Task tool for each chunk:
Task(
subagent_type: "general-purpose",
model: "haiku",
description: "Process chunk N of M: [brief description]",
prompt: "You are processing chunk {N} of {M} in a larger analysis.
ORIGINAL QUERY: {user_query}
YOUR CHUNK:
{chunk_content}
TASK: {specific_task_for_this_chunk}
OUTPUT FORMAT:
- Provide findings as a structured list
- Include confidence score (0-1) for each finding
- If no relevant findings, state: No relevant findings in this chunk.
CONSTRAINT: Focus ONLY on this chunk. Do NOT reference external content."
)
-
Spawn all workers in a single message if chunks are independent (default). Use sequential spawning only if later chunks depend on earlier results.
-
Collect all results: Wait for all Task outputs to return.
Step 4: Evaluate Completeness
After collecting all worker results, assess whether the analysis is sufficient.
Convergence Criteria:
- Completeness >= 90%: Does the synthesis address all aspects of the user query?
- Confidence >= 80%: Are findings well-supported and consistent across chunks?
- Max iterations: 3: Hard limit to prevent infinite loops.
Evaluation Procedure:
- Count how many chunks returned meaningful results.
- Check if all aspects of the user query are addressed.
- Identify gaps: aspects of the query not covered by any chunk.
- Rate completeness (0-100%) and confidence (0-100%).
Decision:
- If completeness >= 90% AND confidence >= 80%: Proceed to Step 5 (Synthesize).
- If gaps found AND iteration < 3: Re-decompose the missing areas and spawn a new batch of Haiku workers targeting the gaps. This is iterative deepening - the workaround for the no-nesting constraint (subagents cannot spawn sub-subagents, so the root model must orchestrate additional rounds).
- If max iterations reached: Proceed to Step 5 with best available results, noting gaps.
Step 5: Synthesize Final Output
Aggregate all worker results into a unified output.
-
Read all worker outputs: Collect findings from every chunk across all iterations.
-
Deduplicate: Remove findings that appear in multiple chunks (especially from overlap regions).
-
Identify cross-chunk patterns: Look for themes, issues, or insights that span multiple chunks. These cross-chunk patterns are the primary value-add of the RLM approach over independent analysis.
-
Resolve contradictions: If workers disagree, investigate the conflict. Cite both perspectives and explain the resolution.
-
Structure the final output:
# [Analysis Title]
## Summary
[2-3 sentence overview of key findings]
## Key Findings
1. [Finding 1] - [source chunk(s)]
2. [Finding 2] - [source chunk(s)]
...
## Cross-Chunk Patterns
- [Pattern spanning multiple chunks]
...
## Detailed Results
[Per-chunk breakdown if relevant]
## Recommendations
[Actionable next steps based on findings]
## Analysis Metadata
- Input: [size description]
- Strategy: [decomposition strategy used]
- Chunks processed: [N]
- Iterations: [N]
- Confidence: [score]/100
- Present to user: Deliver the synthesized output directly.
Model Selection
| Role | Model | Phase | Reasoning |
|---|---|---|---|
| Supervisor | Sonnet | Decomposition (Step 2) | Requires strategic reasoning about input structure |
| Workers | Haiku | Chunk Processing (Step 3) | Focused extraction tasks, cost-efficient |
| Supervisor | Sonnet | Evaluation (Step 4) | Requires judgment about completeness |
| Supervisor | Sonnet | Synthesis (Step 5) | Requires cross-chunk reasoning and pattern detection |
Explicitly specify model: "haiku" in every Task tool invocation for worker sub-agents. The supervisor runs as the main Sonnet conversation.
Error Handling
Handle failures at each level with graceful degradation.
Single Subagent Failure
If one worker Task fails (timeout, empty output, malformed result):
- Retry once with the same prompt and chunk.
- If retry succeeds, use the result normally.
- If retry fails, mark the chunk as skipped and continue with remaining results.
Multiple Failures (>30% of chunks)
If more than 30% of worker Tasks fail:
- Fall back to Sonnet-only processing: Abandon the parallel Haiku approach.
- Process the input directly with Sonnet, using a single pass or sequential reads.
- Accept reduced coverage over unreliable parallel results.
All Failures
If every worker Task fails:
- Process the entire input directly with Sonnet without chunking.
- Read the input in sequential passes if it exceeds context.
- Produce the best output possible from direct processing.
Partial Results
When some chunks succeed and others fail (but failure rate <= 30%):
- Produce output from successful chunks, noting gaps.
- Include a completeness percentage in the output metadata.
- State which areas were not analyzed and why.
Example output note:
Note: Analysis covers 7 of 10 chunks (70% completeness).
Chunks 3, 7, 9 could not be processed. Areas not covered: [list].
Graceful Degradation Ladder
When problems occur, degrade through these levels in order:
| Level | Condition | Action |
|---|---|---|
| Full RLM | All workers succeed | Normal workflow: decompose, parallel process, synthesize |
| Partial RLM | Some workers fail (<= 30%) | Synthesize from successful chunks, note gaps |
| Sonnet Fallback | Many workers fail (> 30%) | Abandon chunking, process directly with Sonnet |
| Best-Effort | All workers fail or Sonnet fallback also struggles | Produce whatever output is possible, clearly state limitations |
Always inform the user which degradation level was reached and why.
When NOT to Use RLM
Do NOT invoke the RLM pattern for:
- Small inputs (< 500 lines): Direct processing is faster and simpler. The overhead of decomposition and synthesis exceeds the benefit.
- Single file analysis: If the task involves one file that fits in context, process it directly.
- Simple queries: Questions like "what does function X do?" or "fix this bug" do not need parallel decomposition.
- Already-structured data: If the user provides a clear, bounded dataset (a single JSON file, a specific API response), process it directly.
- Time-sensitive tasks: If the user needs an immediate answer, RLM adds latency from decomposition and synthesis. Use direct processing for speed.
Examples
Example 1: Comprehensive Codebase Security Review
User request: "Review the entire backend codebase for security vulnerabilities."
Phase 1: Decomposition (Sonnet supervisor)
Assess: 150 Python files, ~25,000 lines. Exceeds thresholds. Use RLM.
Strategy: Structural Decomposition (group by module).
Work plan:
Chunk 1: auth/ (12 files) - Focus: authentication security, session management
Chunk 2: database/ (18 files) - Focus: SQL injection, query safety
Chunk 3: api/ (45 files) - Focus: input validation, error handling
Chunk 4: services/ (50 files) - Focus: business logic, OWASP Top 10
Chunk 5: utils/ (25 files) - Focus: dependency security, utility safety
Phase 2: Processing (5 parallel Haiku workers)
Spawn 5 Task tool calls in a single message:
Task(
subagent_type: "general-purpose",
model: "haiku",
description: "Process chunk 1 of 5: auth module security review",
prompt: "You are processing chunk 1 of 5 in a codebase security review.
ORIGINAL QUERY: Review backend for security vulnerabilities.
YOUR CHUNK: auth/ module - files: login.py, session.py, middleware.py, jwt.py, ...
TASK: Analyze for authentication vulnerabilities: weak crypto, hardcoded secrets,
insecure session storage, JWT implementation issues.
OUTPUT FORMAT:
- List each finding with severity (Critical/High/Medium/Low)
- Include file name and line number
- Confidence score (0-1) per finding
CONSTRAINT: Focus ONLY on this chunk. Do NOT reference external content."
)
(Repeat for chunks 2-5 with module-specific focus.)
Phase 3: Synthesis (Sonnet supervisor)
# Backend Security Review
## Summary
Analysis of 150 files (25K lines) identified 47 security issues across 5 modules.
3 critical issues require immediate attention. Most common finding: missing input
validation (8 occurrences across modules).
## Key Findings
1. SQL injection in database/query_builder.py:45 - Critical - (Chunk 2)
2. Hardcoded API key in utils/config.py:12 - Critical - (Chunk 5)
3. Missing auth on admin endpoints in api/admin_routes.py - Critical - (Chunk 3)
## Cross-Chunk Patterns
- Input validation missing in 4 of 5 modules
- Inconsistent error handling across all modules
- Vulnerable dependency in utils/ used by auth/ and services/
## Recommendations
1. Immediate: Fix 3 critical issues
2. Short-term: Add input validation middleware
3. Long-term: Standardize error handling
## Analysis Metadata
- Input: 150 files, 25,000 lines
- Strategy: Structural Decomposition (by module)
- Chunks processed: 5
- Iterations: 1
- Confidence: 88/100
Example 2: Long Document Analysis
User request: "Summarize this 80-page research paper and extract key findings."
Phase 1: Decomposition (Sonnet supervisor)
Assess: Single document, ~45,000 words, ~15,000 lines. Exceeds thresholds. Use RLM.
Strategy: Structural Decomposition (split by section headings).
Work plan:
Chunk 1: Abstract + Introduction (pages 1-8) - Focus: research question, methodology overview
Chunk 2: Literature Review (pages 9-20) - Focus: prior work, identified gaps
Chunk 3: Methodology (pages 21-35) - Focus: research design, methods used
Chunk 4: Results (pages 36-55) - Focus: key findings, data highlights
Chunk 5: Discussion (pages 56-70) - Focus: interpretation, implications
Chunk 6: Conclusion (pages 71-80) - Focus: contributions, future work
Phase 2: Processing (6 parallel Haiku workers)
Task(
subagent_type: "general-purpose",
model: "haiku",
description: "Process chunk 4 of 6: extract results findings",
prompt: "You are processing chunk 4 of 6 in a research paper analysis.
ORIGINAL QUERY: Summarize paper and extract key findings.
YOUR CHUNK: Results section (pages 36-55)
[section content here]
TASK: Extract key findings, statistical results, and data highlights.
List each finding with supporting evidence from the text.
OUTPUT FORMAT:
- Finding: [description]
- Evidence: [quote or data point]
- Confidence: [0-1]
CONSTRAINT: Focus ONLY on this chunk. Do NOT reference external content."
)
Phase 3: Synthesis (Sonnet supervisor)
# Research Paper Summary: [Title]
## Summary
This study investigates [research question]. Using [methodology], the authors found
[primary result] with implications for [field impact]. The work builds on identified
gaps in [prior work area] and contributes a novel approach to [contribution].
## Key Findings
1. [Primary finding from Results chunk] - supported by [evidence]
2. [Secondary finding] - statistical significance p < 0.01
3. [Methodological innovation from Methodology chunk]
## Cross-Chunk Patterns
- Research question (Chunk 1) directly answered by findings (Chunk 4)
- Limitations (Chunk 6) qualify the strength of results (Chunk 4)
- Prior work gaps (Chunk 2) addressed by methodology (Chunk 3)
## Recommendations
- Follow-up reading: [key citations from literature review]
- Future research directions identified in Conclusion
## Analysis Metadata
- Input: 80 pages, ~45,000 words
- Strategy: Structural Decomposition (by section)
- Chunks processed: 6
- Iterations: 1
- Confidence: 92/100
Example 3: Multi-File Question Answering
User request: "How does the authentication flow work from login to database? Trace the full path."
Phase 1: Decomposition (Sonnet supervisor)
Assess: Question spans 30+ files across frontend, API, backend, and database layers. Use RLM.
Strategy: Structural Decomposition (split by application layer).
Work plan:
Chunk 1: Frontend layer (5 files) - Focus: LoginForm, auth state, API call construction
Chunk 2: API layer (8 files) - Focus: auth routes, middleware, request validation
Chunk 3: Backend layer (12 files) - Focus: auth service, credential verification, token generation
Chunk 4: Database layer (10 files) - Focus: user model, queries, password storage
Phase 2: Processing (4 parallel Haiku workers)
Task(
subagent_type: "general-purpose",
model: "haiku",
description: "Process chunk 1 of 4: trace frontend auth flow",
prompt: "You are processing chunk 1 of 4 in an authentication flow trace.
ORIGINAL QUERY: Trace the authentication flow from login to database.
YOUR CHUNK: Frontend layer files.
Read these files: frontend/src/components/LoginForm.tsx,
frontend/src/store/authSlice.ts, frontend/src/api/auth.ts
TASK: Trace the authentication flow in the frontend layer:
1. What triggers the login? (button click, form submit)
2. What validation happens client-side?
3. What API call is made? (endpoint, method, payload)
4. Include file names and line numbers.
OUTPUT FORMAT:
- Step: [description]
- File: [filename:line]
- Code: [relevant snippet]
- Next: [what happens next / what data passes to next layer]
CONSTRAINT: Focus ONLY on this chunk. Do NOT reference external content."
)
Phase 3: Synthesis (Sonnet supervisor)
# Authentication Flow: Login to Database
## Summary
The auth flow spans 4 layers across 35 files. User submits credentials via
LoginForm.tsx, which dispatches to the API via authSlice. The API validates
the request through middleware, the backend verifies credentials with bcrypt,
and the database stores password hashes with the user model.
## End-to-End Trace
### 1. Frontend (LoginForm.tsx)
- User clicks "Login" button (LoginForm.tsx:45)
- Client-side email validation (LoginForm.tsx:38)
- Redux action: authSlice.login() dispatched (authSlice.ts:52)
- API call: POST /api/auth/login with {email, password} (auth.ts:15)
### 2. API Layer (auth_routes.py)
- Request received at /api/auth/login (auth_routes.py:23)
- CSRF validation middleware (middleware/security.py:15)
- Rate limiting: 5 requests/minute (middleware/rate_limit.py:8)
- Passes to backend: auth_service.authenticate()
### 3. Backend (auth_service.py)
- Credentials sanitized (auth_service.py:67)
- User lookup: user_service.get_by_email() (auth_service.py:72)
- Password verify: bcrypt.verify() (auth_service.py:78)
- JWT token generated (crypto/jwt.py:34)
### 4. Database (user_model.py)
- Query: SELECT id, email, password_hash FROM users WHERE email = ?
- Password stored as bcrypt hash (cost factor 12)
## Cross-Chunk Patterns
- CSRF protection at API layer guards frontend requests
- Rate limiting prevents brute force from frontend
- bcrypt used consistently (backend stores, database persists)
## Analysis Metadata
- Input: 35 files across 4 application layers
- Strategy: Structural Decomposition (by layer)
- Chunks processed: 4
- Iterations: 1
- Confidence: 90/100