using-graph-databases

// Graph database implementation for relationship-heavy data models. Use when building social networks, recommendation engines, knowledge graphs, or fraud detection. Covers Neo4j (primary), ArangoDB, Amazon Neptune, Cypher query patterns, and graph data modeling.

$ git log --oneline --stat

stars:302

forks:57

updated:December 11, 2025

SKILL.mdreadonly

SKILL.md Frontmatter

nameusing-graph-databases

descriptionGraph database implementation for relationship-heavy data models. Use when building social networks, recommendation engines, knowledge graphs, or fraud detection. Covers Neo4j (primary), ArangoDB, Amazon Neptune, Cypher query patterns, and graph data modeling.

Graph Databases

Purpose

This skill guides selection and implementation of graph databases for applications where relationships between entities are first-class citizens. Unlike relational databases that model relationships through foreign keys and joins, graph databases natively represent connections as properties, enabling efficient traversal-heavy queries.

When to Use This Skill

Use graph databases when:

Deep relationship traversals (4+ hops): "Friends of friends of friends"
Variable/evolving relationships: Schema changes don't break existing queries
Path finding: Shortest route, network analysis, dependency chains
Pattern matching: Fraud detection, recommendation engines, access control

Do NOT use graph databases when:

Fixed schema with shallow joins (2-3 tables) → Use PostgreSQL
Primarily aggregations/analytics → Use columnar databases
Key-value lookups only → Use Redis/DynamoDB

Quick Decision Framework

DATA CHARACTERISTICS?
├── Fixed schema, shallow joins (≤3 hops)
│   └─ PostgreSQL (relational)
│
├── Already on PostgreSQL + simple graphs
│   └─ Apache AGE (PostgreSQL extension)
│
├── Deep traversals (4+ hops) + general purpose
│   └─ Neo4j (battle-tested, largest ecosystem)
│
├── Multi-model (documents + graph)
│   └─ ArangoDB
│
├── AWS-native, serverless
│   └─ Amazon Neptune
│
└── Real-time streaming, in-memory
    └─ Memgraph

Core Concepts

Property Graph Model

Graph databases store data as:

Nodes (vertices): Entities with labels and properties
Relationships (edges): Typed connections with properties
Properties: Key-value pairs on nodes and relationships

(Person {name: "Alice", age: 28})-[:FRIEND {since: "2020-01-15"}]->(Person {name: "Bob"})

Query Languages

Language	Databases	Readability	Best For
Cypher	Neo4j, Memgraph, AGE	⭐⭐⭐⭐⭐ SQL-like	General purpose
Gremlin	Neptune, JanusGraph	⭐⭐⭐ Functional	Cross-database
AQL	ArangoDB	⭐⭐⭐⭐ SQL-like	Multi-model
SPARQL	Neptune, RDF stores	⭐⭐⭐ W3C standard	Semantic web

Common Cypher Patterns

Reference references/cypher-patterns.md for comprehensive examples.

Pattern 1: Basic Matching

// Find all users at a company
MATCH (u:User)-[:WORKS_AT]->(c:Company {name: 'Acme Corp'})
RETURN u.name, u.title

Pattern 2: Variable-Length Paths

// Find friends up to 3 degrees away
MATCH (u:User {name: 'Alice'})-[:FRIEND*1..3]->(friend)
WHERE u <> friend
RETURN DISTINCT friend.name
LIMIT 100

Pattern 3: Shortest Path

// Find shortest connection between two users
MATCH path = shortestPath(
  (a:User {name: 'Alice'})-[*]-(b:User {name: 'Bob'})
)
RETURN path, length(path) AS distance

Pattern 4: Recommendations

// Collaborative filtering: Products liked by similar users
MATCH (u:User {id: $userId})-[:PURCHASED]->(p:Product)<-[:PURCHASED]-(similar)
MATCH (similar)-[:PURCHASED]->(rec:Product)
WHERE NOT exists((u)-[:PURCHASED]->(rec))
RETURN rec.name, count(*) AS score
ORDER BY score DESC
LIMIT 10

Pattern 5: Fraud Detection

// Detect circular money flows
MATCH path = (a:Account)-[:SENT*3..6]->(a)
WHERE all(r IN relationships(path) WHERE r.amount > 1000)
RETURN path, [r IN relationships(path) | r.amount] AS amounts

Database Selection Guide

Neo4j (Primary Recommendation)

Use for: General-purpose graph applications

Strengths:

Most mature (2007), largest community (2M+ developers)
65+ graph algorithms (GDS library): PageRank, Louvain, Dijkstra
Best tooling: Neo4j Browser, Bloom visualization
Comprehensive Cypher support

Installation:

# Python driver
pip install neo4j

# TypeScript driver
npm install neo4j-driver

# Rust driver
cargo add neo4rs

Reference: references/neo4j.md

ArangoDB

Use for: Multi-model applications (documents + graph)

Strengths:

Store documents AND graph in one database
AQL combines document and graph queries
Schema flexibility with relationships

Reference: references/arangodb.md

Apache AGE

Use for: Adding graph capabilities to existing PostgreSQL

Strengths:

Extend PostgreSQL with graph queries
No new infrastructure needed
Query both relational and graph data

Reference: Implementation details in examples/

Amazon Neptune

Use for: AWS-native, serverless deployments

Strengths:

Fully managed, auto-scaling
Supports Gremlin AND SPARQL
AWS ecosystem integration

Graph Data Modeling Patterns

Reference references/graph-modeling.md for comprehensive patterns.

Best Practice 1: Relationships as First-Class Citizens

Anti-pattern (storing relationships in node properties):

// BAD
(:Person {name: 'Alice', friend_ids: ['b123', 'c456']})

Pattern (explicit relationships):

// GOOD
(:Person {name: 'Alice'})-[:FRIEND]->(:Person {id: 'b123'})
(:Person {name: 'Alice'})-[:FRIEND]->(:Person {id: 'c456'})

Best Practice 2: Relationship Properties for Metadata

// Track interaction details on relationships
(:Person)-[:FRIEND {
  since: '2020-01-15',
  strength: 0.85,
  last_interaction: datetime()
}]->(:Person)

Best Practice 3: Bounded Traversals for Performance

// SLOW: Unbounded traversal
MATCH (a)-[:FRIEND*]->(distant)
RETURN distant

// FAST: Bounded depth with index
MATCH (a)-[:FRIEND*1..4]->(distant)
WHERE distant.active = true
RETURN distant
LIMIT 100

Best Practice 4: Avoid Supernodes

Problem: Nodes with thousands of relationships slow traversals.

Solution: Intermediate aggregation nodes

// Instead of: (:User)-[:POSTED]->(:Post) [1M relationships]

// Use time partitioning:
(:User)-[:POSTED_IN]->(:Year {year: 2025})
       -[:HAS_MONTH]->(:Month {month: 12})
       -[:HAS_POST]->(:Post)

Use Case Examples

Social Network

Schema and implementation in examples/social-graph/

Key features:

Friend recommendations (friends-of-friends)
Mutual connections
News feed generation
Influence metrics

Knowledge Graph for AI/RAG

Integration example in examples/knowledge-graph/

Key features:

Hybrid vector + graph search
Entity relationship mapping
Context expansion for LLM prompts
Semantic relationship traversal

Integration with Vector Databases:

# Step 1: Vector search in Qdrant/pgvector
vector_results = qdrant.search(collection="concepts", query_vector=embedding)

# Step 2: Expand with graph relationships
concept_ids = [r.id for r in vector_results]
graph_context = neo4j.run("""
  MATCH (c:Concept) WHERE c.id IN $ids
  MATCH (c)-[:RELATED_TO|IS_A*1..2]-(related)
  RETURN c, related, relationships(path)
""", ids=concept_ids)

Recommendation Engine

Examples in examples/social-graph/

Strategies:

Collaborative filtering: "Users who bought X also bought Y"
Content-based: "Products similar to what you like"
Session-based: "Recently viewed items"

Fraud Detection

Pattern detection in examples/

Detection patterns:

Circular money flows
Shared devices across accounts
Rapid transaction chains
Connection pattern anomalies

Performance Optimization

Reference references/cypher-patterns.md for detailed optimization.

Indexing

// Single-property index
CREATE INDEX user_email FOR (u:User) ON (u.email)

// Composite index (Neo4j 5.x+)
CREATE INDEX user_name_location FOR (u:User) ON (u.name, u.location)

// Full-text search
CREATE FULLTEXT INDEX product_search FOR (p:Product) ON EACH [p.name, p.description]

Caching Expensive Aggregations

// Materialize friend count as property
MATCH (u:User)-[:FRIEND]->(f)
WITH u, count(f) AS friendCount
SET u.friend_count = friendCount

// Query becomes instant
MATCH (u:User) WHERE u.friend_count > 100
RETURN u.name, u.friend_count

Scaling Strategies

Scale	Strategy	Implementation
Vertical	Add RAM/CPU	In-memory caching, larger instances
Horizontal (Read)	Read replicas	Neo4j Cluster, ArangoDB Cluster
Horizontal (Write)	Sharding	ArangoDB SmartGraphs, JanusGraph
Caching	App-level cache	Redis for hot paths

Language Integration

Python (Neo4j)

Complete example in examples/social-graph/python-neo4j/

from neo4j import GraphDatabase

class GraphDB:
    def __init__(self, uri: str, user: str, password: str):
        self.driver = GraphDatabase.driver(uri, auth=(user, password))

    def find_friends_of_friends(self, user_id: str, max_depth: int = 2):
        query = """
        MATCH (u:User {id: $userId})-[:FRIEND*1..$maxDepth]->(fof)
        WHERE u <> fof
        RETURN DISTINCT fof.id, fof.name
        LIMIT 100
        """
        with self.driver.session() as session:
            result = session.run(query, userId=user_id, maxDepth=max_depth)
            return [dict(record) for record in result]

# Usage
db = GraphDB("bolt://localhost:7687", "neo4j", "password")
friends = db.find_friends_of_friends("u123", max_depth=3)

TypeScript (Neo4j)

Complete example in examples/social-graph/typescript-neo4j/

import neo4j, { Driver } from 'neo4j-driver'

class Neo4jService {
  private driver: Driver

  constructor(uri: string, username: string, password: string) {
    this.driver = neo4j.driver(uri, neo4j.auth.basic(username, password))
  }

  async findFriendsOfFriends(userId: string, maxDepth: number = 2) {
    const session = this.driver.session()
    try {
      const result = await session.run(
        `MATCH (u:User {id: $userId})-[:FRIEND*1..$maxDepth]->(fof)
         WHERE u <> fof
         RETURN DISTINCT fof.id, fof.name
         LIMIT 100`,
        { userId, maxDepth }
      )
      return result.records.map(r => r.toObject())
    } finally {
      await session.close()
    }
  }
}

Go (ArangoDB)

import (
    "github.com/arangodb/go-driver"
    "github.com/arangodb/go-driver/http"
)

func findFriendsOfFriends(db driver.Database, userId string, maxDepth int) ([]User, error) {
    query := `
        FOR vertex, edge, path IN 1..@maxDepth OUTBOUND @startVertex GRAPH 'socialGraph'
            FILTER vertex._id != @startVertex
            RETURN DISTINCT vertex
            LIMIT 100
    `

    cursor, err := db.Query(ctx, query, map[string]interface{}{
        "startVertex": userId,
        "maxDepth": maxDepth,
    })

    // Handle results...
}

Schema Validation

Use scripts/validate_graph_schema.py to check for:

Unbounded traversals (missing depth limits)
Missing indexes on frequently queried properties
Supernodes (nodes with excessive relationships)
Relationship property consistency

Run validation:

python scripts/validate_graph_schema.py --database neo4j://localhost:7687

Integration with Other Skills

With databases-vector (Hybrid Search)

Combine vector similarity with graph context for AI/RAG applications. See examples/knowledge-graph/

With search-filter

Implement relationship-based queries: "Find all users within 3 degrees of connection"

With ai-chat

Use knowledge graphs to enrich LLM context with structured relationships.

With auth-security (ReBAC)

Implement relationship-based access control: "Can user X access resource Y through relation Z?"

Common Schema Patterns

Star Schema (Hub and Spokes)

(:User)-[:PURCHASED]->(:Product)
(:User)-[:VIEWED]->(:Product)
(:User)-[:RATED]->(:Product)

Hierarchical Schema (Trees)

(:CEO)-[:MANAGES]->(:VP)-[:MANAGES]->(:Director)

Temporal Schema (Event Sequences)

(:Event {timestamp})-[:NEXT]->(:Event {timestamp})

Getting Started

Choose database: Use decision framework above
Design schema: Reference references/graph-modeling.md
Implement queries: Use patterns from references/cypher-patterns.md
Validate: Run scripts/validate_graph_schema.py
Optimize: Add indexes, bound traversals, cache aggregations

using-graph-databases

Graph Databases

Purpose

When to Use This Skill

Quick Decision Framework

Core Concepts

Property Graph Model

Query Languages

Common Cypher Patterns

Pattern 1: Basic Matching

Pattern 2: Variable-Length Paths

Pattern 3: Shortest Path

Pattern 4: Recommendations

Pattern 5: Fraud Detection

Database Selection Guide

Neo4j (Primary Recommendation)

ArangoDB

Apache AGE

Amazon Neptune

Graph Data Modeling Patterns

Best Practice 1: Relationships as First-Class Citizens

Best Practice 2: Relationship Properties for Metadata

Best Practice 3: Bounded Traversals for Performance

Best Practice 4: Avoid Supernodes

Use Case Examples

Social Network

Knowledge Graph for AI/RAG

Recommendation Engine

Fraud Detection

Performance Optimization

Indexing

Caching Expensive Aggregations

Scaling Strategies

Language Integration

Python (Neo4j)

TypeScript (Neo4j)

Go (ArangoDB)

Schema Validation

Integration with Other Skills

With databases-vector (Hybrid Search)

With search-filter

With ai-chat

With auth-security (ReBAC)

Common Schema Patterns

Star Schema (Hub and Spokes)

Hierarchical Schema (Trees)

Temporal Schema (Event Sequences)

Getting Started

Further Reading