Text

// Transform, format, and process text with patterns for writing, data cleaning, localization, citations, and copywriting.

$ git log --oneline --stat

stars:1,933

forks:367

updated:March 4, 2026

SKILL.mdreadonly

SKILL.md Frontmatter

nameText

descriptionTransform, format, and process text with patterns for writing, data cleaning, localization, citations, and copywriting.

Quick Reference

Task	Load
Creative writing (voice, dialogue, POV)	`writing.md`
Data processing (CSV, regex, encoding)	`data.md`
Academic/citations (APA, MLA, Chicago)	`academic.md`
Marketing copy (headlines, CTA, email)	`copy.md`
Translation/localization	`localization.md`

Universal Text Rules

Encoding

Always verify encoding first: file -bi document.txt
Normalize line endings: tr -d '\r'
Remove BOM if present: sed -i '1s/^\xEF\xBB\xBF//'

Whitespace

Collapse multiple spaces: sed 's/[[:space:]]\+/ /g'
Trim leading/trailing: sed 's/^[[:space:]]*//;s/[[:space:]]*$//'

Common Traps

Smart quotes (" ") break parsers → normalize to "
Em/en dashes (– —) break ASCII → normalize to -
Zero-width chars invisible but break comparisons → strip them
String length ≠ byte length in UTF-8 ("café" = 4 chars, 5 bytes)

Format Detection

# Detect encoding
file -I document.txt

# Detect line endings
cat -A document.txt | head -1
# ^M at end = Windows (CRLF)
# No ^M = Unix (LF)

# Detect delimiter (CSV/TSV)
head -1 file | tr -cd ',;\t|' | wc -c

Quick Transformations

Task	Command
Lowercase	`tr '[:upper:]' '[:lower:]'`
Remove punctuation	`tr -d '[:punct:]'`
Count words	`wc -w`
Count unique lines	`sort -u \| wc -l`
Find duplicates	`sort \| uniq -d`
Extract emails	`grep -oE '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'`
Extract URLs	`grep -oE 'https?://[^[:space:]<>"{}

Before Processing Checklist

Encoding verified (UTF-8?)
Line endings normalized
Delimiter identified (for structured text)
Target format/style defined
Edge cases considered (empty, Unicode, special chars)