Назад към всички

Data

// Work with data across the full lifecycle from extraction and cleaning to analysis, visualization, and reporting.

$ git log --oneline --stat
stars:1,933
forks:367
updated:March 4, 2026
SKILL.mdreadonly
SKILL.md Frontmatter
nameData
slugdata
version1.0.1
changelogMinor refinements for consistency
descriptionWork with data across the full lifecycle from extraction and cleaning to analysis, visualization, and reporting.
metadata[object Object]

When to Use

User needs to: extract data from sources (databases, APIs, files), clean and transform messy datasets, analyze and find patterns, visualize results, or automate recurring data tasks. Agent handles the full data workflow.

Quick Reference

AreaFileFocus
Querying & Extractionquerying.mdSQL generation, API fetching, multi-source
Cleaning & Transformationcleaning.mdNulls, duplicates, normalization, joins
Analysis & Statisticsanalysis.mdEDA, statistical tests, insights
Visualization & Reportingvisualization.mdCharts, dashboards, exports
Quality & Validationquality.mdData checks, anomaly detection, drift
Workflow Patternspatterns.mdCommon data workflows, automation

Core Operations

Query generation: User describes what data they need → Agent writes SQL/query, handles joins, filters, aggregations → Returns results or explains execution plan.

Data cleaning: Load messy dataset → Detect issues (nulls, duplicates, outliers, inconsistent formats) → Apply appropriate fixes → Document transformations.

Exploratory analysis: New dataset arrives → Generate descriptive stats, distributions, correlations → Surface interesting patterns and anomalies → Produce summary with key findings.

Visualization: Analysis complete → Generate appropriate chart type → Export in requested format (PNG, SVG, interactive HTML) → Ready for stakeholders.

Recurring reports: Define report once → Agent runs on schedule → Updates charts and metrics → Delivers summary with highlights.

Critical Rules

  • Always preview transformations before applying — show sample of what will change
  • Document every data transformation with source, operation, and rationale
  • Validate data types and ranges before analysis — garbage in, garbage out
  • Use appropriate statistical tests — check assumptions first
  • Generate reproducible outputs — include seeds, versions, timestamps
  • Handle missing data explicitly — document chosen strategy (drop, impute, flag)
  • Match chart type to data type — categorical, continuous, time series

User Modes

ModeFocusTrigger
AnalystSQL, exploration, insights"What does this data tell us?"
EngineerPipelines, transformations, quality"Clean this and load it there"
BusinessKPIs, dashboards, plain language"How are we doing vs last quarter?"
ResearcherStatistical rigor, reproducibility"Is this difference significant?"
DeveloperSchema design, API data, types"Generate types from this JSON"

See patterns.md for workflows per mode.

On First Use

  1. Identify data source (database, file, API)
  2. Establish connection or load file
  3. Initial EDA — shape, types, quality issues
  4. Clean and transform as needed
  5. Analyze or visualize per user goal