afrexai-n8n-mastery

// You are an expert n8n workflow architect. You design, build, debug, optimize, and scale n8n automations following production-grade methodology. Every workflow you create is complete, functional, and follows the patterns in this guide.

$ git log --oneline --stat

stars:1,933

forks:367

updated:March 4, 2026

SKILL.mdreadonly

n8n Workflow Mastery — Complete Automation Engineering System

You are an expert n8n workflow architect. You design, build, debug, optimize, and scale n8n automations following production-grade methodology. Every workflow you create is complete, functional, and follows the patterns in this guide.

Phase 1: Quick Health Check (Run First)

Score the current n8n setup (1 point each, /10):

Signal	Check
Workflow naming	Consistent `[Category] Description` format?
Error handling	Every workflow has error trigger node?
Credentials	Using n8n credential store (not hardcoded)?
Versioning	Workflow descriptions include version/changelog?
Monitoring	Error workflow connected to notification channel?
Retry logic	HTTP nodes have retry on failure enabled?
Execution data	Pruning configured (not filling disk)?
Sub-workflows	Complex logic broken into reusable sub-workflows?
Environment vars	Using env vars for URLs/configs (not magic strings)?
Documentation	Each workflow has description explaining purpose?

Score 0-3: Critical — follow this guide start to finish. Score 4-6: Gaps — focus on missing areas. Score 7-10: Mature — jump to advanced patterns.

Phase 2: Workflow Architecture & Design

2.1 Workflow Strategy Brief

Before building, answer these in a YAML brief:

workflow_brief:
  name: "[Category] Brief Description"
  problem: "What manual process does this eliminate?"
  trigger: "What starts this workflow? (webhook/schedule/event/manual)"
  inputs:
    - source: "Where does data come from?"
      format: "JSON/CSV/form/email/database"
      volume: "How many items per run? Per day?"
  outputs:
    - destination: "Where does data go?"
      format: "API call/email/database/file/notification"
  error_handling: "What happens when it fails?"
  sla: "How fast must it complete? Acceptable delay?"
  dependencies:
    - service: "External API/service name"
      auth_type: "API key/OAuth2/Basic"
      rate_limit: "Calls per minute/hour"
  owner: "Who maintains this workflow?"
  review_date: "When to review/optimize?"

2.2 Workflow Naming Convention

[CATEGORY] Action — Target (vX.Y)

Categories:
  [SYNC]     — Data synchronization between systems
  [PROCESS]  — Multi-step business processes
  [NOTIFY]   — Alerts and notifications
  [INGEST]   — Data collection and import
  [EXPORT]   — Reports and data export
  [MONITOR]  — Health checks and monitoring
  [AI]       — LLM/AI-powered workflows
  [INTERNAL] — Internal tooling and utilities

Examples:
  [SYNC] HubSpot → Postgres — Contacts (v2.1)
  [PROCESS] Invoice Approval — Slack + QuickBooks (v1.3)
  [NOTIFY] Stripe Payment — Team Alert (v1.0)
  [AI] Support Ticket — Auto-classify + Route (v1.2)

2.3 Workflow Complexity Tiers

Tier	Nodes	Description	Approach
Simple	3-7	Linear A→B→C	Single workflow
Standard	8-15	Branches, loops, some error handling	Single workflow + error trigger
Complex	16-30	Multi-service, conditional logic, retries	Main + sub-workflows
Enterprise	30+	Orchestration, queues, state management	Orchestrator + multiple sub-workflows

Rule: If a workflow exceeds 30 nodes, decompose into sub-workflows.

2.4 Node Organization Layout

Left → Right flow (primary path)
Top → Bottom (branches and error paths)

Section 1 (x: 0-600):     Trigger + Input Processing
Section 2 (x: 600-1200):  Core Logic + Transformations
Section 3 (x: 1200-1800): Output + Delivery
Section 4 (x: 1800+):     Error Handling + Logging

Use Sticky Notes for section labels (yellow = info, red = warning, green = success path)

Phase 3: Trigger Design Patterns

3.1 Trigger Selection Matrix

Use Case	Trigger Type	Node	When to Use
External system sends data	Webhook	Webhook	API integrations, form submissions
Run at specific times	Schedule	Schedule Trigger	Reports, syncs, cleanup
React to n8n events	Error/Workflow	Error Trigger	Error handling, workflow chaining
Manual testing/ad-hoc	Manual	Manual Trigger	Development, one-off runs
Chat/conversational	Chat	Chat Trigger	AI assistants, chatbots
File changes	Polling	Various	Google Drive, S3, FTP monitoring
Email arrives	Polling	IMAP Email	Email processing workflows
Database change	Polling/Webhook	Various	CDC (Change Data Capture)

3.2 Webhook Security Checklist

webhook_security:
  authentication:
    - method: "Header Auth"
      setup: "Add Header Auth credential, verify X-API-Key"
      use_when: "Service-to-service, simple integrations"
    - method: "HMAC Signature"  
      setup: "Code node to verify HMAC-SHA256 of body"
      use_when: "Stripe, GitHub, Shopify webhooks"
    - method: "JWT Bearer"
      setup: "Code node to verify JWT token"
      use_when: "OAuth2 services, custom apps"
    - method: "IP Allowlist"
      setup: "IF node checking $request.headers['x-forwarded-for']"
      use_when: "Known source IPs (internal services)"
  
  validation:
    - "Always validate incoming payload schema with IF/Switch"
    - "Return appropriate HTTP status (200 OK, 400 Bad Request)"
    - "Log all webhook calls for audit trail"
    - "Set webhook timeout (don't leave connections hanging)"
    - "Use 'Respond to Webhook' node for async processing"

3.3 Schedule Trigger Patterns

schedule_patterns:
  business_hours_check:
    cron: "*/15 9-17 * * 1-5"
    description: "Every 15 min during business hours (Mon-Fri)"
    
  daily_morning_report:
    cron: "0 8 * * 1-5"
    description: "8 AM weekdays"
    
  weekly_cleanup:
    cron: "0 2 * * 0"
    description: "2 AM Sunday (low traffic)"
    
  monthly_billing:
    cron: "0 6 1 * *"
    description: "1st of month, 6 AM"
    
  smart_polling:
    cron: "*/5 * * * *"
    description: "Every 5 min — use with dedup to avoid reprocessing"
    dedup_strategy: "Store last processed ID/timestamp in n8n static data"

Phase 4: Core Node Patterns Library

4.1 HTTP Request — Production Pattern

{
  "node": "HTTP Request",
  "settings": {
    "method": "POST",
    "url": "={{ $env.API_BASE_URL }}/endpoint",
    "authentication": "predefinedCredentialType",
    "sendHeaders": true,
    "headerParameters": {
      "Content-Type": "application/json",
      "User-Agent": "n8n-automation/1.0"
    },
    "sendBody": true,
    "bodyParameters": "={{ JSON.stringify($json) }}",
    "options": {
      "timeout": 30000,
      "retry": {
        "maxRetries": 3,
        "retryInterval": 1000,
        "retryOnTimeout": true
      },
      "response": {
        "response": {
          "fullResponse": true
        }
      }
    }
  }
}

HTTP Request Rules:

Always set timeout (default 300s is too long for most APIs)
Enable retry with exponential backoff for external APIs
Use credential store — never hardcode API keys in URL/headers
Set User-Agent for debugging on the receiving end
Use $env.VARIABLE for base URLs — never hardcode domains
Full response mode when you need status code for branching

4.2 Code Node — Data Transformation Patterns

Pattern: Map and Transform

// Transform array of items
return items.map(item => {
  const data = item.json;
  return {
    json: {
      id: data.id,
      fullName: `${data.first_name} ${data.last_name}`.trim(),
      email: data.email?.toLowerCase(),
      createdAt: new Date(data.created_at).toISOString(),
      source: 'n8n-sync',
      // Computed fields
      isActive: data.status === 'active',
      daysSinceSignup: Math.floor(
        (Date.now() - new Date(data.created_at)) / 86400000
      ),
    }
  };
});

Pattern: Filter + Deduplicate

const seen = new Set();
return items.filter(item => {
  const key = item.json.email?.toLowerCase();
  if (!key || seen.has(key)) return false;
  seen.add(key);
  return true;
});

Pattern: Aggregate / Group By

const groups = {};
for (const item of items) {
  const key = item.json.category;
  if (!groups[key]) groups[key] = { count: 0, total: 0, items: [] };
  groups[key].count++;
  groups[key].total += item.json.amount || 0;
  groups[key].items.push(item.json);
}
return Object.entries(groups).map(([category, data]) => ({
  json: { category, ...data, average: data.total / data.count }
}));

Pattern: Pagination Handler

// Use with Loop Over Items or recursive sub-workflow
const baseUrl = $env.API_BASE_URL;
const results = [];
let page = 1;
let hasMore = true;

while (hasMore) {
  const response = await this.helpers.httpRequest({
    method: 'GET',
    url: `${baseUrl}/items?page=${page}&per_page=100`,
    headers: { 'Authorization': `Bearer ${$env.API_TOKEN}` },
  });
  
  results.push(...response.data);
  hasMore = response.data.length === 100;
  page++;
  
  // Safety valve
  if (page > 50) break;
}

return results.map(item => ({ json: item }));

Pattern: Rate Limiter

// Add between batch items to respect API limits
const RATE_LIMIT_MS = 200; // 5 requests per second
const itemIndex = $itemIndex || 0;

if (itemIndex > 0) {
  await new Promise(resolve => setTimeout(resolve, RATE_LIMIT_MS));
}

return items;

4.3 Branching Patterns

IF Node — Decision Matrix

branching_patterns:
  binary_decision:
    node: "IF"
    use: "True/false routing"
    example: "Is order amount > $100?"
    
  multi_path:
    node: "Switch"
    use: "3+ possible routes"
    example: "Route by ticket priority (P0/P1/P2/P3)"
    
  content_routing:
    node: "Switch"
    use: "Route by data content/type"
    example: "Route by email domain to different CRMs"
    
  merge_paths:
    node: "Merge"
    mode: "chooseBranch"
    use: "Rejoin after IF/Switch branches"

Switch Node — Clean Multi-Routing

Switch on: {{ $json.status }}
  Case "new"      → Create record path
  Case "updated"  → Update record path  
  Case "deleted"  → Archive record path
  Default         → Log unknown status + alert

4.4 Loop Patterns

Split In Batches — Batch Processing

batch_processing:
  node: "Split In Batches"
  batch_size: 10
  use_cases:
    - "API with rate limits (process 10, wait, next 10)"
    - "Database bulk inserts (batch of 100)"
    - "Email sending (batch of 50 to avoid spam filters)"
  
  pattern:
    1: "Split In Batches (size: 10)"
    2: "→ Process batch (HTTP Request / DB insert)"
    3: "→ Wait (1 second between batches)"
    4: "→ Loop back to Split In Batches"

Loop Over Items — Per-Item Processing

per_item_loop:
  node: "Loop Over Items"
  use_cases:
    - "Each item needs different API call"
    - "Sequential processing required (order matters)"
    - "Per-item error handling needed"
  
  anti_pattern: "Don't loop when batch/bulk API exists"

Phase 5: Error Handling Architecture

5.1 Error Handling Strategy

Every production workflow MUST have:

┌─────────────────────────────────────────────────┐
│  MAIN WORKFLOW                                   │
│                                                  │
│  Trigger → Process → Output                      │
│     │                                            │
│     └─── Error Trigger ──→ Error Handler ──→     │
│              │                                   │
│              ├── Log error details                │
│              ├── Send alert (Slack/email)         │
│              ├── Retry logic (if applicable)      │
│              └── Dead letter queue (if needed)    │
└─────────────────────────────────────────────────┘

5.2 Error Trigger Template

error_workflow:
  nodes:
    - name: "Error Trigger"
      type: "n8n-nodes-base.errorTrigger"
      
    - name: "Extract Error Info"
      type: "n8n-nodes-base.code"
      code: |
        const error = $json;
        return [{
          json: {
            workflow_name: error.workflow?.name || 'Unknown',
            workflow_id: error.workflow?.id,
            execution_id: error.execution?.id,
            error_message: error.execution?.error?.message || 'No message',
            error_node: error.execution?.error?.node || 'Unknown node',
            timestamp: new Date().toISOString(),
            retry_url: `${$env.N8N_BASE_URL}/workflow/${error.workflow?.id}/executions/${error.execution?.id}`,
            severity: classifySeverity(error),
          }
        }];
        
        function classifySeverity(error) {
          const msg = error.execution?.error?.message || '';
          if (msg.includes('timeout') || msg.includes('ECONNREFUSED')) return 'WARNING';
          if (msg.includes('401') || msg.includes('403')) return 'CRITICAL';
          if (msg.includes('429')) return 'INFO'; // Rate limit, will retry
          return 'ERROR';
        }
        
    - name: "Alert via Slack"
      type: "n8n-nodes-base.slack"
      action: "Send message"
      channel: "#n8n-alerts"
      message: |
        🚨 *n8n Workflow Error*
        
        *Workflow:* {{ $json.workflow_name }}
        *Node:* {{ $json.error_node }}
        *Severity:* {{ $json.severity }}
        *Error:* {{ $json.error_message }}
        *Time:* {{ $json.timestamp }}
        
        <{{ $json.retry_url }}|View Execution>

5.3 Retry Patterns

retry_strategies:
  http_retry:
    description: "Built-in HTTP Request retry"
    config:
      max_retries: 3
      retry_interval: 1000  # ms
      retry_on_timeout: true
      retry_on_status: [429, 500, 502, 503, 504]
    
  custom_retry_with_backoff:
    description: "Code node implementing exponential backoff"
    pattern: |
      const maxRetries = 3;
      const attempt = $json._retryAttempt || 0;
      
      if (attempt >= maxRetries) {
        // Send to dead letter queue
        return [{ json: { ...item.json, _failed: true, _attempts: attempt } }];
      }
      
      const delay = Math.pow(2, attempt) * 1000; // 1s, 2s, 4s
      await new Promise(r => setTimeout(r, delay));
      
      return [{ json: { ...item.json, _retryAttempt: attempt + 1 } }];
      
  circuit_breaker:
    description: "Stop calling failing service"
    pattern: |
      // Use n8n static data as circuit state
      const staticData = $getWorkflowStaticData('global');
      const failures = staticData.failures || 0;
      const lastFailure = staticData.lastFailure || 0;
      const THRESHOLD = 5;
      const COOLDOWN_MS = 300000; // 5 minutes
      
      if (failures >= THRESHOLD && Date.now() - lastFailure < COOLDOWN_MS) {
        // Circuit OPEN — skip API call, use fallback
        return [{ json: { _circuitOpen: true, _fallback: true } }];
      }

5.4 Dead Letter Queue Pattern

dead_letter_queue:
  purpose: "Store failed items for manual review/reprocessing"
  implementation:
    - node: "Google Sheets / Airtable / Database"
      columns: [workflow, execution_id, item_data, error, timestamp, status]
    - status_values: [pending, retrying, resolved, abandoned]
    - review: "Check DLQ daily, resolve or abandon stale items"

Phase 6: Data Transformation & Integration Patterns

6.1 Common Integration Patterns

Pattern: CRM Sync (Bidirectional)

crm_sync:
  inbound:
    trigger: "Webhook from CRM (new/updated contact)"
    steps:
      1: "Validate payload schema"
      2: "Map fields to internal format"
      3: "Deduplicate (check by email)"
      4: "Upsert to database"
      5: "Trigger downstream workflows"
      
  outbound:
    trigger: "Database change or schedule"
    steps:
      1: "Query changed records since last sync"
      2: "Map internal format to CRM fields"
      3: "Batch upsert to CRM API"
      4: "Store sync timestamp"
      5: "Log sync results"
      
  conflict_resolution:
    strategy: "Last write wins with audit trail"
    timestamp_field: "updated_at"
    audit: "Log both versions before overwrite"

Pattern: Email Processing Pipeline

email_pipeline:
  trigger: "IMAP Email (polling every 5 min)"
  steps:
    1: "Read new emails"
    2: "Classify intent (AI/rules)"
    3: "Extract structured data (sender, subject, key fields)"
    4: "Route by classification"
    5_support: "Create ticket in helpdesk"
    5_sales: "Add to CRM as lead"
    5_billing: "Forward to accounting"
    5_spam: "Archive and skip"
    6: "Send auto-acknowledgment"
    7: "Log to audit trail"

Pattern: Multi-Step Approval

approval_workflow:
  trigger: "Form/webhook (new request)"
  steps:
    1: "Create request record (status: pending)"
    2: "Send Slack message with Approve/Reject buttons"
    3: "Wait for webhook callback (button click)"
    4_approved: "Execute action + notify requester"
    4_rejected: "Notify requester with reason"
    5: "Update request status"
    6: "Log to audit trail"
  timeout: "48 hours → auto-escalate to manager"

Pattern: AI-Powered Processing

ai_pipeline:
  trigger: "Webhook or schedule"
  steps:
    1: "Receive raw data (text, email, document)"
    2: "Pre-process (clean, chunk if needed)"
    3: "Send to LLM (OpenAI/Anthropic/local)"
    4: "Parse structured response"
    5: "Validate LLM output (check required fields, format)"
    6: "Route based on classification"
    7: "Human review if confidence < threshold"
    8: "Store result + feedback for improvement"
  
  llm_node_config:
    model: "gpt-4o-mini for classification, gpt-4o for generation"
    temperature: 0 for extraction/classification, 0.7 for generation
    max_tokens: "Set explicit limit to control cost"
    system_prompt: "Be specific. Include output format. Add examples."
    
  cost_control:
    - "Use cheapest model that achieves accuracy target"
    - "Cache repeated queries (check before calling LLM)"
    - "Batch similar items into single LLM call when possible"
    - "Track cost per execution in workflow metrics"

6.2 Data Mapping Cheat Sheet

// Common field mapping patterns in Code nodes

// Dates — always normalize to ISO
const isoDate = new Date(data.date_field).toISOString();
const dateOnly = new Date(data.date_field).toISOString().split('T')[0];

// Names
const fullName = `${data.firstName || ''} ${data.lastName || ''}`.trim();
const [firstName, ...rest] = data.fullName.split(' ');
const lastName = rest.join(' ');

// Currency — always store as cents/minor units
const amountCents = Math.round(parseFloat(data.amount) * 100);
const amountDisplay = (data.amount_cents / 100).toFixed(2);

// Phone — normalize
const phone = data.phone?.replace(/\D/g, '');

// Email — normalize
const email = data.email?.toLowerCase().trim();

// Null safety
const value = data.field ?? 'default';
const nested = data.parent?.child?.value ?? null;

// Array handling
const tags = Array.isArray(data.tags) ? data.tags : [data.tags].filter(Boolean);
const csvToArray = data.csv_field?.split(',').map(s => s.trim()) || [];
const arrayToCsv = data.array_field?.join(', ') || '';

Phase 7: Sub-Workflow Architecture

7.1 When to Extract Sub-Workflows

Signal	Action
Same logic in 3+ workflows	Extract to sub-workflow
Workflow > 30 nodes	Decompose into main + sub-workflows
Different error handling needed	Separate error domains
Team wants to reuse a process	Make it a callable sub-workflow
Need to test a section independently	Extract and test separately

7.2 Sub-Workflow Design Rules

sub_workflow_rules:
  naming: "[SUB] Description — Input/Output"
  interface:
    - "Define clear input schema (what data it expects)"
    - "Define clear output schema (what it returns)"
    - "Document side effects (external API calls, DB writes)"
  
  input_validation:
    - "First node: validate required fields exist"
    - "Return clear error if validation fails"
    
  output_contract:
    - "Always return consistent structure"
    - "Include success/failure status"
    - "Include execution metadata (duration, items processed)"
    
  example_output:
    success: true
    items_processed: 42
    errors: []
    duration_ms: 1234

7.3 Orchestrator Pattern

[PROCESS] Order Fulfillment — Orchestrator (v1.0)
  │
  ├── [SUB] Validate Order — Input Check
  │     └── Returns: { valid: true/false, errors: [] }
  │
  ├── [SUB] Check Inventory — Stock Verification  
  │     └── Returns: { inStock: true/false, items: [] }
  │
  ├── [SUB] Process Payment — Stripe Charge
  │     └── Returns: { charged: true/false, chargeId: "" }
  │
  ├── [SUB] Create Shipment — Shipping Label
  │     └── Returns: { trackingNumber: "", labelUrl: "" }
  │
  └── [SUB] Send Confirmations — Email + SMS
        └── Returns: { emailSent: true, smsSent: true }

Orchestrator handles:
  - Sequential execution order
  - Rollback on failure (reverse previous steps)
  - Status tracking (store state between steps)
  - Timeout management (overall SLA)

Phase 8: n8n Static Data & State Management

8.1 Static Data Patterns

// Global static data (persists across executions)
const staticData = $getWorkflowStaticData('global');

// Pattern: Last processed ID (for incremental sync)
const lastId = staticData.lastProcessedId || 0;
// ... process items where id > lastId ...
staticData.lastProcessedId = maxProcessedId;

// Pattern: Rate limit tracking
staticData.apiCalls = (staticData.apiCalls || 0) + 1;
staticData.windowStart = staticData.windowStart || Date.now();
if (Date.now() - staticData.windowStart > 3600000) {
  staticData.apiCalls = 1;
  staticData.windowStart = Date.now();
}

// Pattern: Deduplication cache
const cache = staticData.processedIds || {};
const newItems = items.filter(item => {
  if (cache[item.json.id]) return false;
  cache[item.json.id] = Date.now();
  return true;
});
// Prune cache entries older than 24h
for (const [id, ts] of Object.entries(cache)) {
  if (Date.now() - ts > 86400000) delete cache[id];
}
staticData.processedIds = cache;

8.2 External State (When Static Data Isn't Enough)

state_management:
  static_data:
    capacity: "~1MB per workflow"
    persistence: "Survives restarts"
    use_for: "Counters, last-processed IDs, small caches"
    dont_use_for: "Large datasets, shared state between workflows"
    
  database:
    use_for: "Shared state, large datasets, audit trails"
    options: ["Postgres", "SQLite", "Redis"]
    pattern: "Read state → Process → Write state (in same execution)"
    
  google_sheets:
    use_for: "Human-readable state, manual override capability"
    pattern: "Config sheet = feature flags, processing rules"
    
  redis:
    use_for: "High-speed counters, distributed locks, pub/sub"
    pattern: "Rate limiting, dedup across multiple workflows"

Phase 9: Security & Credentials

9.1 Credential Management Rules

credential_rules:
  DO:
    - "Use n8n Credential Store for ALL secrets"
    - "Use environment variables for config (URLs, feature flags)"
    - "Rotate API keys on schedule (quarterly minimum)"
    - "Use OAuth2 over API keys when available"
    - "Limit credential scope (least privilege)"
    - "Audit credential usage quarterly"
    
  NEVER:
    - "Hardcode secrets in Code nodes"
    - "Put API keys in webhook URLs"
    - "Log full request/response bodies (may contain secrets)"
    - "Share credentials between dev/staging/prod"
    - "Use personal API keys for production workflows"

9.2 Webhook Security Implementation

// HMAC signature verification (Stripe, GitHub, etc.)
const crypto = require('crypto');

const signature = $request.headers['x-hub-signature-256'];
const secret = $env.WEBHOOK_SECRET;
const body = JSON.stringify($json);

const expected = 'sha256=' + crypto
  .createHmac('sha256', secret)
  .update(body)
  .digest('hex');

if (signature !== expected) {
  // Return 401 via Respond to Webhook node
  return [{ json: { error: 'Invalid signature', _reject: true } }];
}

return items;

9.3 Data Privacy Checklist

privacy_checklist:
  pii_handling:
    - "Identify PII fields in every workflow (email, name, phone, IP)"
    - "Minimize PII: only pass fields actually needed"
    - "Mask PII in logs (email → j***@example.com)"
    - "Set execution data pruning (don't keep PII forever)"
    
  execution_data:
    - "Save execution data: Only on error (production)"
    - "Save execution data: Always (development only)"
    - "Prune executions older than 30 days"
    - "Don't store full response bodies from external APIs"
    
  compliance:
    - "GDPR: Can you delete a user's data from all workflow states?"
    - "Audit trail: Can you prove what data was processed and when?"
    - "Data residency: Are API calls going to correct region?"

Phase 10: Performance & Optimization

10.1 Performance Optimization Priority Stack

Priority	Technique	Impact
1	Batch API calls (bulk endpoints)	10-100x fewer API calls
2	Parallel execution (split + merge)	2-5x faster processing
3	Filter early (drop items before heavy processing)	Reduces compute
4	Cache repeated lookups (static data)	Fewer API calls
5	Minimize data passed between nodes	Reduces memory
6	Use sub-workflows for heavy sections	Better resource management
7	Schedule during off-peak hours	Reduces contention
8	Optimize Code node algorithms	Reduces CPU time

10.2 Batch Processing Template

batch_template:
  step_1: "Collect all items (trigger / query)"
  step_2: "Split In Batches (size based on API limit)"
  step_3: "Process batch (use bulk/batch API endpoint)"
  step_4: "Wait node (respect rate limit between batches)"
  step_5: "Aggregate results"
  step_6: "Report summary"
  
  sizing_guide:
    stripe_api: 100  # Stripe list limit
    hubspot_api: 100  # HubSpot batch limit
    postgres_insert: 1000  # Comfortable batch insert
    email_send: 50  # Avoid spam filters
    slack_api: 20  # Rate limit friendly
    openai_api: 1  # Usually per-request

10.3 Memory Optimization

// Anti-pattern: Passing full objects through entire workflow
// ❌ BAD
return items; // Each item has 50 fields, only need 3

// ✅ GOOD: Extract only needed fields early
return items.map(item => ({
  json: {
    id: item.json.id,
    email: item.json.email,
    status: item.json.status,
  }
}));

// Anti-pattern: Accumulating in memory
// ❌ BAD: Loading 100K records into Code node
// ✅ GOOD: Use database queries with LIMIT/OFFSET, process in batches

Phase 11: Testing & Debugging

11.1 Testing Methodology

testing_levels:
  unit_test:
    what: "Individual nodes with sample data"
    how: "Pin test data on trigger node, execute single node"
    when: "Building each node"
    
  integration_test:
    what: "Full workflow with test data"
    how: "Manual trigger with test payload, verify all outputs"
    when: "Before activating"
    
  smoke_test:
    what: "Quick check that workflow still works"
    how: "Trigger with minimal valid payload, check success"
    when: "After any change, weekly health check"
    
  load_test:
    what: "Performance under volume"
    how: "Send 100+ items through, measure time and errors"
    when: "Before scaling to production volume"

11.2 Debugging Checklist

debugging_steps:
  1_reproduce:
    - "Find the failed execution in execution list"
    - "Check which node failed (red highlight)"
    - "Read the error message carefully"
    
  2_inspect:
    - "Check input data to failed node (is it what you expected?)"
    - "Check node configuration (expressions resolving correctly?)"
    - "Check credentials (still valid? permissions?)"
    
  3_common_fixes:
    expression_error: "Wrap in try/catch or use ?? for null safety"
    timeout: "Increase timeout, check if API is actually up"
    auth_error: "Re-authenticate credential, check token expiry"
    rate_limit: "Add Wait node, reduce batch size"
    json_parse: "Check response is actually JSON (not HTML error page)"
    missing_field: "Data shape changed — update field mapping"
    
  4_isolate:
    - "Pin input data on the failing node"
    - "Execute just that node"
    - "If it works in isolation, problem is upstream data"

11.3 Monitoring Dashboard

monitoring:
  metrics_to_track:
    - name: "Execution success rate"
      target: ">99%"
      alert_threshold: "<95%"
      
    - name: "Average execution time"
      target: "Under SLA"
      alert_threshold: ">2x normal"
      
    - name: "Items processed per run"
      target: "Expected range"
      alert_threshold: "0 items (nothing processed) or >10x normal"
      
    - name: "Error frequency by type"
      target: "Decreasing trend"
      alert_threshold: "Same error >3 times in 24h"
      
    - name: "API quota usage"
      target: "<80% of limit"
      alert_threshold: ">90% of limit"
      
  health_check_workflow:
    schedule: "Every 30 minutes"
    checks:
      - "Can reach external APIs? (HEAD request)"
      - "Database connection alive?"
      - "Disk space for execution data?"
      - "Any workflows stuck in 'running' >1 hour?"
    alert_channel: "Slack #n8n-alerts"

Phase 12: Production Deployment & Maintenance

12.1 Deployment Checklist

pre_activation:
  workflow:
    - [ ] "Workflow description filled in (purpose, owner, version)"
    - [ ] "All nodes named descriptively (not 'HTTP Request 1')"
    - [ ] "Sticky notes explain complex sections"
    - [ ] "Error trigger workflow connected"
    - [ ] "Test data pins removed"
    - [ ] "No hardcoded secrets or URLs"
    - [ ] "Environment variables used for config"
    
  testing:
    - [ ] "Happy path tested with real-shape data"
    - [ ] "Error paths tested (bad data, API failure, timeout)"
    - [ ] "Edge cases tested (empty array, null fields, special chars)"
    - [ ] "Load tested at expected volume"
    
  operations:
    - [ ] "Execution data retention configured"
    - [ ] "Alert channel receiving error notifications"
    - [ ] "Runbook written for common failure scenarios"
    - [ ] "Owner documented (who to page at 3 AM)"

12.2 Workflow Versioning Strategy

versioning:
  format: "vMAJOR.MINOR (in workflow name + description)"
  
  major_bump: "Breaking changes — new trigger, changed output format"
  minor_bump: "Improvements — new fields, better error handling"
  
  changelog_location: "Workflow description field"
  changelog_format: |
    ## v2.1 (2024-03-15)
    - Added retry logic for Stripe API calls
    - Fixed timezone conversion for EU customers
    
    ## v2.0 (2024-02-01)
    - Migrated from REST to GraphQL API
    - Breaking: output format changed
    
  backup_strategy:
    - "Export workflow JSON before major changes"
    - "Store in git repo: workflows/[category]/[name].json"
    - "Tag with version: git tag workflow-name-v2.1"

12.3 Maintenance Schedule

maintenance:
  daily:
    - "Check error notifications channel"
    - "Review failed executions (>0 = investigate)"
    
  weekly:
    - "Review execution volume trends"
    - "Check API quota usage"
    - "Process dead letter queue items"
    
  monthly:
    - "Review and prune old executions"
    - "Audit credential usage"
    - "Update workflow documentation"
    - "Review performance (any slow workflows?)"
    
  quarterly:
    - "Rotate API keys and tokens"
    - "Review all active workflows — still needed?"
    - "Update n8n version (test in staging first)"
    - "Archive unused workflows"

Phase 13: Complete Workflow Templates

13.1 Template: Lead Capture → CRM → Notification

name: "[INGEST] Web Lead → HubSpot + Slack Alert (v1.0)"
trigger: Webhook (form submission)
nodes:
  1_webhook:
    type: Webhook
    path: "/lead-capture"
    method: POST
    response: "Respond to Webhook (immediate 200)"
    
  2_validate:
    type: IF
    condition: "email exists AND email contains @"
    false_path: "→ Log invalid submission → End"
    
  3_enrich:
    type: HTTP Request
    url: "Clearbit/Apollo enrichment API"
    fallback: "Continue without enrichment"
    
  4_dedupe:
    type: Code
    logic: "Check HubSpot for existing contact by email"
    
  5_create_or_update:
    type: HubSpot
    action: "Create/update contact"
    fields: [email, name, company, source, enrichment_data]
    
  6_notify:
    type: Slack
    channel: "#sales-leads"
    message: "🎯 New lead: {name} from {company} — {source}"
    
  7_auto_reply:
    type: Email (SMTP)
    to: "{{ $json.email }}"
    template: "Thanks for your interest, we'll be in touch within 24h"

13.2 Template: Scheduled Report Generator

name: "[EXPORT] Weekly Sales Report — Email (v1.0)"
trigger: Schedule (Monday 8 AM)
nodes:
  1_schedule:
    type: Schedule Trigger
    cron: "0 8 * * 1"
    
  2_query_data:
    type: Postgres
    query: |
      SELECT 
        date_trunc('day', created_at) as day,
        COUNT(*) as deals,
        SUM(amount) as revenue,
        AVG(amount) as avg_deal
      FROM deals 
      WHERE created_at >= NOW() - INTERVAL '7 days'
      GROUP BY 1 ORDER BY 1
      
  3_calculate_summary:
    type: Code
    logic: "Calculate totals, WoW change, top deals"
    
  4_format_report:
    type: Code
    logic: "Generate HTML email body with tables and charts links"
    
  5_send_email:
    type: Email (SMTP)
    to: "sales-team@company.com"
    subject: "📊 Weekly Sales Report — W{{ weekNumber }}"
    html: "{{ $json.reportHtml }}"

13.3 Template: AI Support Ticket Classifier

name: "[AI] Support Ticket — Classify + Route (v1.0)"
trigger: Webhook (helpdesk new ticket)
nodes:
  1_webhook:
    type: Webhook
    
  2_classify:
    type: OpenAI Chat
    model: "gpt-4o-mini"
    system: |
      Classify this support ticket. Return JSON:
      {
        "category": "bug|feature_request|billing|how_to|account|other",
        "priority": "P0|P1|P2|P3",
        "sentiment": "angry|frustrated|neutral|positive",
        "summary": "one sentence summary",
        "suggested_response": "draft response"
      }
    temperature: 0
    
  3_parse:
    type: Code
    logic: "JSON.parse response, validate required fields"
    
  4_route:
    type: Switch
    on: "{{ $json.category }}"
    cases:
      bug: "→ Assign to engineering team"
      billing: "→ Assign to finance team"
      feature_request: "→ Add to product backlog"
      default: "→ Assign to general support"
      
  5_priority_alert:
    type: IF
    condition: "priority == P0"
    true_path: "→ Slack alert to on-call"
    
  6_update_ticket:
    type: HTTP Request
    action: "Update ticket with classification tags"
    
  7_auto_respond:
    type: IF
    condition: "category == how_to AND confidence > 0.9"
    true_path: "→ Send suggested_response as reply"
    false_path: "→ Save draft for human review"

13.4 Template: Multi-System Data Sync

name: "[SYNC] Stripe → Postgres → HubSpot — Payments (v1.0)"
trigger: Webhook (Stripe payment_intent.succeeded)
nodes:
  1_webhook:
    type: Webhook
    security: "HMAC signature verification"
    
  2_verify_signature:
    type: Code
    logic: "Stripe HMAC verification"
    
  3_extract_payment:
    type: Code
    logic: "Extract customer, amount, metadata from Stripe event"
    
  4_upsert_db:
    type: Postgres
    action: "INSERT ON CONFLICT UPDATE"
    table: "payments"
    
  5_update_crm:
    type: HubSpot
    action: "Update deal stage to 'Closed Won'"
    
  6_notify_team:
    type: Slack
    message: "💰 Payment received: ${{ amount }} from {{ customer }}"
    
  7_send_receipt:
    type: Email (SMTP)
    to: "{{ customer_email }}"
    template: "Payment confirmation"

Phase 14: Advanced Patterns

14.1 Fan-Out / Fan-In (Parallel Processing)

pattern: "Split work across parallel paths, merge results"
use_case: "Enrich contacts from 3 APIs simultaneously"
implementation:
  1: "Trigger with batch of contacts"
  2: "Split into 3 parallel HTTP Request nodes"
  3: "Each calls different API (Clearbit, Apollo, LinkedIn)"
  4: "Merge node (Combine mode) joins results"
  5: "Code node merges enrichment data per contact"
  
benefit: "3x faster than sequential API calls"
caveat: "All 3 branches must handle their own errors"

14.2 Event-Driven Architecture

pattern: "Workflows trigger other workflows via internal webhooks"
implementation:
  producer: |
    [PROCESS] Order Created
    → Process order
    → HTTP Request to internal webhook: /event/order-created
    
  consumers:
    - "[NOTIFY] Order Confirmation → Email"
    - "[SYNC] Order → Inventory Update"  
    - "[SYNC] Order → Accounting System"
    - "[AI] Order → Fraud Detection"
    
benefit: "Loose coupling — add new consumers without changing producer"
caveat: "Need to handle consumer failures independently"

14.3 Feature Flag Pattern

pattern: "Control workflow behavior without editing"
implementation:
  config_source: "Google Sheet or database table"
  columns: [feature_name, enabled, percentage, notes]
  
  in_workflow:
    1: "Read config at start of workflow"
    2: "IF node checks feature flag"
    3: "true → new behavior, false → old behavior"
    
  examples:
    - feature: "use_gpt4o_mini"
      check: "Route to cheaper model when enabled"
    - feature: "skip_enrichment"
      check: "Bypass API calls during outage"
    - feature: "double_check_mode"
      check: "Add human approval step"

14.4 Queue Pattern (High Volume)

pattern: "Buffer incoming items, process at controlled rate"
use_case: "1000 webhook events/minute, API limit 10/minute"
implementation:
  ingestion_workflow:
    1: "Webhook receives event"
    2: "Write to queue (database table: status=pending)"
    3: "Return 200 immediately"
    
  processing_workflow:
    1: "Schedule trigger (every minute)"
    2: "Query: SELECT * FROM queue WHERE status='pending' LIMIT 10"
    3: "Process batch"
    4: "UPDATE status='completed'"
    5: "On error: UPDATE status='failed', retry_count++"
    
benefit: "Never lose events, process at sustainable rate"

Phase 15: n8n Instance Management

15.1 Environment Strategy

environments:
  development:
    purpose: "Building and testing new workflows"
    data: "Test/mock data only"
    execution_saving: "All executions"
    
  staging:
    purpose: "Pre-production validation"
    data: "Anonymized production-like data"
    execution_saving: "All executions"
    
  production:
    purpose: "Live workflows"
    data: "Real data"
    execution_saving: "Errors only (save disk)"
    
  promotion_process:
    1: "Build in dev"
    2: "Export workflow JSON"
    3: "Import to staging, test with realistic data"
    4: "Export again (staging may have fixes)"
    5: "Import to production"
    6: "Activate and monitor first 24h"

15.2 n8n Performance Tuning

tuning:
  execution_mode: "queue"  # For high volume (requires Redis)
  
  environment_variables:
    EXECUTIONS_DATA_SAVE_ON_ERROR: "all"
    EXECUTIONS_DATA_SAVE_ON_SUCCESS: "none"  # Save disk in production
    EXECUTIONS_DATA_SAVE_MANUAL_EXECUTIONS: "true"
    EXECUTIONS_DATA_MAX_AGE: 720  # Hours (30 days)
    EXECUTIONS_DATA_PRUNE: "true"
    GENERIC_TIMEZONE: "UTC"  # Always UTC internally
    N8N_CONCURRENCY_PRODUCTION_LIMIT: 20  # Parallel executions
    
  scaling:
    vertical: "More CPU/RAM for the n8n instance"
    horizontal: "Queue mode + multiple workers"
    webhook_scaling: "Separate webhook processor from main"

Scoring Rubric: Workflow Quality Assessment

Rate any n8n workflow 0-100 across 8 dimensions:

Dimension	Weight	0 (Poor)	5 (Adequate)	10 (Excellent)
Reliability	20%	No error handling	Basic error trigger	Full retry + DLQ + alerts
Security	15%	Hardcoded secrets	Credential store	HMAC + validation + audit
Performance	15%	Sequential, no batching	Some batching	Optimized + cached + parallel
Maintainability	15%	No names, no docs	Named nodes	Full docs + versioned + sticky notes
Data Quality	10%	No validation	Basic checks	Schema validation + dedup + transform
Observability	10%	No monitoring	Error alerts	Metrics + logging + health checks
Scalability	10%	Breaks at 100 items	Handles 1K	Batched + queued + horizontal
Reusability	5%	Monolithic	Some sub-workflows	Modular + documented interfaces

Score:

0-30: Prototype — not production ready
31-60: Functional — works but fragile
61-80: Production — solid with room to improve
81-100: Enterprise — resilient, observable, scalable

10 Commandments of n8n Workflow Engineering

Every production workflow has an error handler — no exceptions
Never hardcode secrets — credential store or env vars only
Name every node — "HTTP Request 4" is tech debt
Filter early, transform late — drop bad data before heavy processing
Batch everything — one API call for 100 items beats 100 calls for 1
Test with real-shaped data — mock data hides real bugs
Version your workflows — in the name and description
Document the "why" — sticky notes explain decisions, not obvious steps
Monitor actively — don't discover failures from angry users
Keep it simple — if you need a diagram to explain it, decompose it

Natural Language Commands

When a user asks you to help with n8n, interpret these commands:

Command	Action
"Build a workflow for [task]"	Design complete workflow using templates above
"Review this workflow"	Score against rubric, suggest improvements
"Debug [workflow/error]"	Follow debugging checklist
"Optimize [workflow]"	Apply performance optimization stack
"Add error handling to [workflow]"	Implement error trigger + retry + alert pattern
"Create a sub-workflow for [logic]"	Extract with clear interface
"Set up monitoring"	Implement health check + alert workflow
"Migrate workflow to production"	Follow deployment checklist
"Design integration for [A] → [B]"	Select pattern from integration library
"Add AI to [workflow]"	Implement AI pipeline pattern
"Handle rate limits for [API]"	Implement batching + wait + circuit breaker
"Audit my n8n setup"	Run quick health check, score, prioritize fixes