using-graph-databases
// Graph database implementation for relationship-heavy data models. Use when building social networks, recommendation engines, knowledge graphs, or fraud detection. Covers Neo4j (primary), ArangoDB, Amazon Neptune, Cypher query patterns, and graph data modeling.
Graph Databases
Purpose
This skill guides selection and implementation of graph databases for applications where relationships between entities are first-class citizens. Unlike relational databases that model relationships through foreign keys and joins, graph databases natively represent connections as properties, enabling efficient traversal-heavy queries.
When to Use This Skill
Use graph databases when:
- Deep relationship traversals (4+ hops): "Friends of friends of friends"
- Variable/evolving relationships: Schema changes don't break existing queries
- Path finding: Shortest route, network analysis, dependency chains
- Pattern matching: Fraud detection, recommendation engines, access control
Do NOT use graph databases when:
- Fixed schema with shallow joins (2-3 tables) → Use PostgreSQL
- Primarily aggregations/analytics → Use columnar databases
- Key-value lookups only → Use Redis/DynamoDB
Quick Decision Framework
DATA CHARACTERISTICS?
├── Fixed schema, shallow joins (≤3 hops)
│ └─ PostgreSQL (relational)
│
├── Already on PostgreSQL + simple graphs
│ └─ Apache AGE (PostgreSQL extension)
│
├── Deep traversals (4+ hops) + general purpose
│ └─ Neo4j (battle-tested, largest ecosystem)
│
├── Multi-model (documents + graph)
│ └─ ArangoDB
│
├── AWS-native, serverless
│ └─ Amazon Neptune
│
└── Real-time streaming, in-memory
└─ Memgraph
Core Concepts
Property Graph Model
Graph databases store data as:
- Nodes (vertices): Entities with labels and properties
- Relationships (edges): Typed connections with properties
- Properties: Key-value pairs on nodes and relationships
(Person {name: "Alice", age: 28})-[:FRIEND {since: "2020-01-15"}]->(Person {name: "Bob"})
Query Languages
| Language | Databases | Readability | Best For |
|---|---|---|---|
| Cypher | Neo4j, Memgraph, AGE | ⭐⭐⭐⭐⭐ SQL-like | General purpose |
| Gremlin | Neptune, JanusGraph | ⭐⭐⭐ Functional | Cross-database |
| AQL | ArangoDB | ⭐⭐⭐⭐ SQL-like | Multi-model |
| SPARQL | Neptune, RDF stores | ⭐⭐⭐ W3C standard | Semantic web |
Common Cypher Patterns
Reference references/cypher-patterns.md for comprehensive examples.
Pattern 1: Basic Matching
// Find all users at a company
MATCH (u:User)-[:WORKS_AT]->(c:Company {name: 'Acme Corp'})
RETURN u.name, u.title
Pattern 2: Variable-Length Paths
// Find friends up to 3 degrees away
MATCH (u:User {name: 'Alice'})-[:FRIEND*1..3]->(friend)
WHERE u <> friend
RETURN DISTINCT friend.name
LIMIT 100
Pattern 3: Shortest Path
// Find shortest connection between two users
MATCH path = shortestPath(
(a:User {name: 'Alice'})-[*]-(b:User {name: 'Bob'})
)
RETURN path, length(path) AS distance
Pattern 4: Recommendations
// Collaborative filtering: Products liked by similar users
MATCH (u:User {id: $userId})-[:PURCHASED]->(p:Product)<-[:PURCHASED]-(similar)
MATCH (similar)-[:PURCHASED]->(rec:Product)
WHERE NOT exists((u)-[:PURCHASED]->(rec))
RETURN rec.name, count(*) AS score
ORDER BY score DESC
LIMIT 10
Pattern 5: Fraud Detection
// Detect circular money flows
MATCH path = (a:Account)-[:SENT*3..6]->(a)
WHERE all(r IN relationships(path) WHERE r.amount > 1000)
RETURN path, [r IN relationships(path) | r.amount] AS amounts
Database Selection Guide
Neo4j (Primary Recommendation)
Use for: General-purpose graph applications
Strengths:
- Most mature (2007), largest community (2M+ developers)
- 65+ graph algorithms (GDS library): PageRank, Louvain, Dijkstra
- Best tooling: Neo4j Browser, Bloom visualization
- Comprehensive Cypher support
Installation:
# Python driver
pip install neo4j
# TypeScript driver
npm install neo4j-driver
# Rust driver
cargo add neo4rs
Reference: references/neo4j.md
ArangoDB
Use for: Multi-model applications (documents + graph)
Strengths:
- Store documents AND graph in one database
- AQL combines document and graph queries
- Schema flexibility with relationships
Reference: references/arangodb.md
Apache AGE
Use for: Adding graph capabilities to existing PostgreSQL
Strengths:
- Extend PostgreSQL with graph queries
- No new infrastructure needed
- Query both relational and graph data
Reference: Implementation details in examples/
Amazon Neptune
Use for: AWS-native, serverless deployments
Strengths:
- Fully managed, auto-scaling
- Supports Gremlin AND SPARQL
- AWS ecosystem integration
Graph Data Modeling Patterns
Reference references/graph-modeling.md for comprehensive patterns.
Best Practice 1: Relationships as First-Class Citizens
Anti-pattern (storing relationships in node properties):
// BAD
(:Person {name: 'Alice', friend_ids: ['b123', 'c456']})
Pattern (explicit relationships):
// GOOD
(:Person {name: 'Alice'})-[:FRIEND]->(:Person {id: 'b123'})
(:Person {name: 'Alice'})-[:FRIEND]->(:Person {id: 'c456'})
Best Practice 2: Relationship Properties for Metadata
// Track interaction details on relationships
(:Person)-[:FRIEND {
since: '2020-01-15',
strength: 0.85,
last_interaction: datetime()
}]->(:Person)
Best Practice 3: Bounded Traversals for Performance
// SLOW: Unbounded traversal
MATCH (a)-[:FRIEND*]->(distant)
RETURN distant
// FAST: Bounded depth with index
MATCH (a)-[:FRIEND*1..4]->(distant)
WHERE distant.active = true
RETURN distant
LIMIT 100
Best Practice 4: Avoid Supernodes
Problem: Nodes with thousands of relationships slow traversals.
Solution: Intermediate aggregation nodes
// Instead of: (:User)-[:POSTED]->(:Post) [1M relationships]
// Use time partitioning:
(:User)-[:POSTED_IN]->(:Year {year: 2025})
-[:HAS_MONTH]->(:Month {month: 12})
-[:HAS_POST]->(:Post)
Use Case Examples
Social Network
Schema and implementation in examples/social-graph/
Key features:
- Friend recommendations (friends-of-friends)
- Mutual connections
- News feed generation
- Influence metrics
Knowledge Graph for AI/RAG
Integration example in examples/knowledge-graph/
Key features:
- Hybrid vector + graph search
- Entity relationship mapping
- Context expansion for LLM prompts
- Semantic relationship traversal
Integration with Vector Databases:
# Step 1: Vector search in Qdrant/pgvector
vector_results = qdrant.search(collection="concepts", query_vector=embedding)
# Step 2: Expand with graph relationships
concept_ids = [r.id for r in vector_results]
graph_context = neo4j.run("""
MATCH (c:Concept) WHERE c.id IN $ids
MATCH (c)-[:RELATED_TO|IS_A*1..2]-(related)
RETURN c, related, relationships(path)
""", ids=concept_ids)
Recommendation Engine
Examples in examples/social-graph/
Strategies:
- Collaborative filtering: "Users who bought X also bought Y"
- Content-based: "Products similar to what you like"
- Session-based: "Recently viewed items"
Fraud Detection
Pattern detection in examples/
Detection patterns:
- Circular money flows
- Shared devices across accounts
- Rapid transaction chains
- Connection pattern anomalies
Performance Optimization
Reference references/cypher-patterns.md for detailed optimization.
Indexing
// Single-property index
CREATE INDEX user_email FOR (u:User) ON (u.email)
// Composite index (Neo4j 5.x+)
CREATE INDEX user_name_location FOR (u:User) ON (u.name, u.location)
// Full-text search
CREATE FULLTEXT INDEX product_search FOR (p:Product) ON EACH [p.name, p.description]
Caching Expensive Aggregations
// Materialize friend count as property
MATCH (u:User)-[:FRIEND]->(f)
WITH u, count(f) AS friendCount
SET u.friend_count = friendCount
// Query becomes instant
MATCH (u:User) WHERE u.friend_count > 100
RETURN u.name, u.friend_count
Scaling Strategies
| Scale | Strategy | Implementation |
|---|---|---|
| Vertical | Add RAM/CPU | In-memory caching, larger instances |
| Horizontal (Read) | Read replicas | Neo4j Cluster, ArangoDB Cluster |
| Horizontal (Write) | Sharding | ArangoDB SmartGraphs, JanusGraph |
| Caching | App-level cache | Redis for hot paths |
Language Integration
Python (Neo4j)
Complete example in examples/social-graph/python-neo4j/
from neo4j import GraphDatabase
class GraphDB:
def __init__(self, uri: str, user: str, password: str):
self.driver = GraphDatabase.driver(uri, auth=(user, password))
def find_friends_of_friends(self, user_id: str, max_depth: int = 2):
query = """
MATCH (u:User {id: $userId})-[:FRIEND*1..$maxDepth]->(fof)
WHERE u <> fof
RETURN DISTINCT fof.id, fof.name
LIMIT 100
"""
with self.driver.session() as session:
result = session.run(query, userId=user_id, maxDepth=max_depth)
return [dict(record) for record in result]
# Usage
db = GraphDB("bolt://localhost:7687", "neo4j", "password")
friends = db.find_friends_of_friends("u123", max_depth=3)
TypeScript (Neo4j)
Complete example in examples/social-graph/typescript-neo4j/
import neo4j, { Driver } from 'neo4j-driver'
class Neo4jService {
private driver: Driver
constructor(uri: string, username: string, password: string) {
this.driver = neo4j.driver(uri, neo4j.auth.basic(username, password))
}
async findFriendsOfFriends(userId: string, maxDepth: number = 2) {
const session = this.driver.session()
try {
const result = await session.run(
`MATCH (u:User {id: $userId})-[:FRIEND*1..$maxDepth]->(fof)
WHERE u <> fof
RETURN DISTINCT fof.id, fof.name
LIMIT 100`,
{ userId, maxDepth }
)
return result.records.map(r => r.toObject())
} finally {
await session.close()
}
}
}
Go (ArangoDB)
import (
"github.com/arangodb/go-driver"
"github.com/arangodb/go-driver/http"
)
func findFriendsOfFriends(db driver.Database, userId string, maxDepth int) ([]User, error) {
query := `
FOR vertex, edge, path IN 1..@maxDepth OUTBOUND @startVertex GRAPH 'socialGraph'
FILTER vertex._id != @startVertex
RETURN DISTINCT vertex
LIMIT 100
`
cursor, err := db.Query(ctx, query, map[string]interface{}{
"startVertex": userId,
"maxDepth": maxDepth,
})
// Handle results...
}
Schema Validation
Use scripts/validate_graph_schema.py to check for:
- Unbounded traversals (missing depth limits)
- Missing indexes on frequently queried properties
- Supernodes (nodes with excessive relationships)
- Relationship property consistency
Run validation:
python scripts/validate_graph_schema.py --database neo4j://localhost:7687
Integration with Other Skills
With databases-vector (Hybrid Search)
Combine vector similarity with graph context for AI/RAG applications.
See examples/knowledge-graph/
With search-filter
Implement relationship-based queries: "Find all users within 3 degrees of connection"
With ai-chat
Use knowledge graphs to enrich LLM context with structured relationships.
With auth-security (ReBAC)
Implement relationship-based access control: "Can user X access resource Y through relation Z?"
Common Schema Patterns
Star Schema (Hub and Spokes)
(:User)-[:PURCHASED]->(:Product)
(:User)-[:VIEWED]->(:Product)
(:User)-[:RATED]->(:Product)
Hierarchical Schema (Trees)
(:CEO)-[:MANAGES]->(:VP)-[:MANAGES]->(:Director)
Temporal Schema (Event Sequences)
(:Event {timestamp})-[:NEXT]->(:Event {timestamp})
Getting Started
- Choose database: Use decision framework above
- Design schema: Reference
references/graph-modeling.md - Implement queries: Use patterns from
references/cypher-patterns.md - Validate: Run
scripts/validate_graph_schema.py - Optimize: Add indexes, bound traversals, cache aggregations
Further Reading
references/neo4j.md- Neo4j setup, drivers, GDS algorithmsreferences/arangodb.md- ArangoDB multi-model patternsreferences/cypher-patterns.md- Comprehensive Cypher query libraryreferences/graph-modeling.md- Data modeling best practicesexamples/social-graph/- Complete social network implementationexamples/knowledge-graph/- Hybrid vector + graph for AI/RAG