Назад към всички

ClickHouse

// Query, optimize, and administer ClickHouse OLAP databases with schema design, performance tuning, and data ingestion patterns.

$ git log --oneline --stat
stars:1,933
forks:367
updated:March 4, 2026
SKILL.mdreadonly
SKILL.md Frontmatter
nameClickHouse
slugclickhouse
version1.0.1
homepagehttps://clawic.com/skills/clickhouse
descriptionQuery, optimize, and administer ClickHouse OLAP databases with schema design, performance tuning, and data ingestion patterns.
metadata[object Object]

ClickHouse 🏠

Real-time analytics on billions of rows. Sub-second queries. No indexes needed.

Setup

On first use, read setup.md for connection configuration.

When to Use

User needs OLAP analytics, log analysis, time-series data, or real-time dashboards. Agent handles schema design, query optimization, data ingestion, and cluster administration.

Architecture

Memory lives in ~/clickhouse/. See memory-template.md for structure.

~/clickhouse/
├── memory.md        # Connection profiles + query patterns
├── schemas/         # Table definitions per database
└── queries/         # Saved analytical queries

Quick Reference

TopicFile
Setup & connectionsetup.md
Memory templatememory-template.md
Query patternsqueries.md
Performance tuningperformance.md
Data ingestioningestion.md

Core Rules

1. Always Specify Engine

Every table needs an explicit engine. Default to MergeTree family:

-- Time-series / logs
CREATE TABLE events (
    timestamp DateTime,
    event_type String,
    data String
) ENGINE = MergeTree()
ORDER BY (timestamp, event_type);

-- Aggregated metrics
CREATE TABLE daily_stats (
    date Date,
    metric String,
    value AggregateFunction(sum, UInt64)
) ENGINE = AggregatingMergeTree()
ORDER BY (date, metric);

2. ORDER BY is Your Index

ClickHouse has no traditional indexes. The ORDER BY clause determines data layout:

  • Put high-cardinality filter columns first
  • Put range columns (dates, timestamps) early
  • Match your most common WHERE patterns
-- Good: filters by user_id, then date range
ORDER BY (user_id, date, event_type)

-- Bad: date first when you filter by user_id
ORDER BY (date, user_id, event_type)

3. Use Appropriate Data Types

Use CaseTypeWhy
TimestampsDateTime or DateTime64Native time functions
Low-cardinality stringsLowCardinality(String)10x compression
Enums with few valuesEnum8 or Enum16Smallest footprint
Nullable only if neededNullable(T)Adds overhead
IPsIPv4 or IPv64 bytes vs 16+

4. Batch Inserts

Never insert row-by-row. ClickHouse is optimized for batch writes:

# Good: batch insert
clickhouse-client --query="INSERT INTO events FORMAT JSONEachRow" < batch.json

# Bad: individual inserts in a loop
for row in data:
    INSERT INTO events VALUES (...)

Minimum batch: 1,000 rows. Optimal: 10,000-100,000 rows.

5. Prewarm Queries with FINAL

Queries on ReplacingMergeTree/CollapsingMergeTree need FINAL for accuracy:

-- May return duplicates/old versions
SELECT * FROM users WHERE id = 123;

-- Guaranteed latest version
SELECT * FROM users FINAL WHERE id = 123;

FINAL has performance cost. For dashboards, consider materialized views.

6. Materialized Views for Speed

Pre-aggregate expensive computations:

CREATE MATERIALIZED VIEW hourly_events
ENGINE = SummingMergeTree()
ORDER BY (hour, event_type)
AS SELECT
    toStartOfHour(timestamp) AS hour,
    event_type,
    count() AS events
FROM events
GROUP BY hour, event_type;

7. Check System Tables First

Before debugging, check system tables:

-- Running queries
SELECT * FROM system.processes;

-- Recent query performance
SELECT query, elapsed, read_rows, memory_usage
FROM system.query_log
WHERE type = 'QueryFinish'
ORDER BY event_time DESC
LIMIT 10;

-- Table sizes
SELECT database, table, formatReadableSize(total_bytes) as size
FROM system.tables
ORDER BY total_bytes DESC;

Common Traps

  • String instead of LowCardinality → 10x larger storage for status/type columns
  • Wrong ORDER BY → Full table scans instead of index lookups
  • Row-by-row inserts → Massive part fragmentation, slow writes
  • Missing TTL → Unbounded table growth, disk full
  • **SELECT *** → Reads all columns, kills columnar advantage
  • Nullable everywhere → Overhead + NULL handling complexity
  • Forgetting FINAL → Stale/duplicate data in merge tables

Performance Checklist

Before running expensive queries:

  1. Check EXPLAIN: EXPLAIN SELECT ... shows execution plan
  2. Sample first: SELECT ... FROM table SAMPLE 0.01 for 1% sample
  3. Limit columns: Only SELECT what you need
  4. Use PREWHERE: Filters before reading all columns
  5. Check parts: SELECT count() FROM system.parts WHERE table='X'
-- PREWHERE optimization
SELECT user_id, event_type, data
FROM events
PREWHERE date = today()
WHERE event_type = 'click';

Cluster Administration

Adding TTL for Data Retention

-- Delete old data
ALTER TABLE events
MODIFY TTL timestamp + INTERVAL 90 DAY;

-- Move to cold storage
ALTER TABLE events
MODIFY TTL timestamp + INTERVAL 30 DAY TO VOLUME 'cold';

Monitoring Disk Usage

SELECT
    database,
    table,
    formatReadableSize(sum(bytes_on_disk)) as disk_size,
    sum(rows) as total_rows,
    count() as parts
FROM system.parts
WHERE active
GROUP BY database, table
ORDER BY sum(bytes_on_disk) DESC;

External Endpoints

EndpointData SentPurpose
localhost:8123SQL queriesHTTP interface
localhost:9000SQL queriesNative TCP interface

No external services contacted. All queries run against user-specified ClickHouse instances.

Security & Privacy

Data saved locally (with user consent):

  • Connection profiles (host, port, database) in ~/clickhouse/memory.md
  • Query patterns and schema documentation
  • Authentication method preferences (password vs certificate)

Important: If you provide database passwords, they are stored in plain text in ~/clickhouse/. Consider using environment variables or connection profiles managed by clickhouse-client instead.

This skill does NOT:

  • Connect to any ClickHouse without explicit user configuration
  • Send data to external services
  • Automatically collect or store credentials without asking

Related Skills

Install with clawhub install <slug> if user confirms:

  • sql — SQL query patterns
  • analytics — data analysis workflows
  • data-analysis — structured data exploration

Feedback

  • If useful: clawhub star clickhouse
  • Stay updated: clawhub sync