Назад към всички

acestep-songwriting

// Music songwriting guide for ACE-Step. Provides professional knowledge on writing captions, lyrics, choosing BPM/key/duration, and structuring songs. Use this skill when users want to create, write, or plan a song before generating it with ACE-Step.

$ git log --oneline --stat
stars:1,933
forks:367
updated:March 4, 2026
SKILL.mdreadonly
SKILL.md Frontmatter
nameacestep-songwriting
descriptionMusic songwriting guide for ACE-Step. Provides professional knowledge on writing captions, lyrics, choosing BPM/key/duration, and structuring songs. Use this skill when users want to create, write, or plan a song before generating it with ACE-Step.
allowed-toolsRead

ACE-Step Songwriting Guide

Professional music creation knowledge for writing captions, lyrics, and choosing music parameters for ACE-Step.

Output Format

After using this guide, produce two things for the acestep skill:

  1. Caption (-c): Style/genre/instruments/emotion description
  2. Lyrics (-l): Complete structured lyrics with tags
  3. Parameters: --duration, --bpm, --key, --time-signature, --language

Caption: The Most Important Input

Caption is the most important factor affecting generated music.

Supports multiple formats: simple style words, comma-separated tags, complex natural language descriptions.

Common Dimensions

DimensionExamples
Style/Genrepop, rock, jazz, electronic, hip-hop, R&B, folk, classical, lo-fi, synthwave
Emotion/Atmospheremelancholic, uplifting, energetic, dreamy, dark, nostalgic, euphoric, intimate
Instrumentsacoustic guitar, piano, synth pads, 808 drums, strings, brass, electric bass
Timbre Texturewarm, bright, crisp, muddy, airy, punchy, lush, raw, polished
Era Reference80s synth-pop, 90s grunge, 2010s EDM, vintage soul, modern trap
Production Stylelo-fi, high-fidelity, live recording, studio-polished, bedroom pop
Vocal Characteristicsfemale vocal, male vocal, breathy, powerful, falsetto, raspy, choir
Speed/Rhythmslow tempo, mid-tempo, fast-paced, groovy, driving, laid-back
Structure Hintsbuilding intro, catchy chorus, dramatic bridge, fade-out ending

Caption Writing Principles

  1. Specific beats vague — "sad piano ballad with female breathy vocal" > "a sad song"
  2. Combine multiple dimensions — style+emotion+instruments+timbre anchors direction precisely
  3. Use references well — "in the style of 80s synthwave" conveys complex aesthetic quickly
  4. Texture words are useful — warm, crisp, airy, punchy influence mixing and timbre
  5. Don't pursue perfection — Caption is a starting point, iterate based on results
  6. Granularity determines freedom — Less detail = more model creativity; more detail = more control
  7. Avoid conflicting words — "classical strings" + "hardcore metal" degrades output
    • Fix: Repetition reinforcement — Repeat the elements you want more
    • Fix: Conflict to evolution — "Start with soft strings, middle becomes metal rock, end turns to hip-hop"
  8. Don't put BPM/key/tempo in Caption — Use dedicated parameters instead

Lyrics: The Temporal Script

Lyrics controls how music unfolds over time. It carries:

  • Lyric text itself
  • Structure tags ([Verse], [Chorus], [Bridge]...)
  • Vocal style hints ([raspy vocal], [whispered]...)
  • Instrumental sections ([guitar solo], [drum break]...)
  • Energy changes ([building energy], [explosive drop]...)

Structure Tags

CategoryTagDescription
Basic Structure[Intro]Opening, establish atmosphere
[Verse] / [Verse 1]Verse, narrative progression
[Pre-Chorus]Pre-chorus, build energy
[Chorus]Chorus, emotional climax
[Bridge]Bridge, transition or elevation
[Outro]Ending, conclusion
Dynamic Sections[Build]Energy gradually rising
[Drop]Electronic music energy release
[Breakdown]Reduced instrumentation, space
Instrumental[Instrumental]Pure instrumental, no vocals
[Guitar Solo]Guitar solo
[Piano Interlude]Piano interlude
Special[Fade Out]Fade out ending
[Silence]Silence

Combining Tags

Use - for finer control, but keep it concise:

✅ [Chorus - anthemic]
❌ [Chorus - anthemic - stacked harmonies - high energy - powerful - epic]

Put complex style descriptions in Caption, not in tags.

Caption-Lyrics Consistency

Models are not good at resolving conflicts. Checklist:

  • Instruments in Caption ↔ Instrumental section tags in Lyrics
  • Emotion in Caption ↔ Energy tags in Lyrics
  • Vocal description in Caption ↔ Vocal control tags in Lyrics

Vocal Control Tags

TagEffect
[raspy vocal]Raspy, textured vocals
[whispered]Whispered
[falsetto]Falsetto
[powerful belting]Powerful, high-pitched singing
[spoken word]Rap/recitation
[harmonies]Layered harmonies
[call and response]Call and response
[ad-lib]Improvised embellishments

Energy and Emotion Tags

TagEffect
[high energy]High energy, passionate
[low energy]Low energy, restrained
[building energy]Increasing energy
[explosive]Explosive energy
[melancholic]Melancholic
[euphoric]Euphoric
[dreamy]Dreamy
[aggressive]Aggressive

Lyric Writing Tips

  1. 6-10 syllables per line — Model aligns syllables to beats; keep similar counts for lines in same position (±1-2)
  2. Uppercase = stronger intensityWE ARE THE CHAMPIONS! (shouting) vs walking through the streets (normal)
  3. Parentheses = background vocalsWe rise together (together)
  4. Extend vowelsFeeeling so aliiive (use cautiously, effects unstable)
  5. Clear section separation — Blank lines between sections

Avoiding "AI-flavored" Lyrics

Red FlagDescription
Adjective stacking"neon skies, electric hearts, endless dreams" — vague imagery filler
Rhyme chaosInconsistent patterns or forced rhymes breaking meaning
Blurred boundariesLyric content crosses structure tags
No breathing roomLines too long to sing in one breath
Mixed metaphorsWater → fire → flying — listeners can't anchor

Metaphor discipline: One core metaphor per song, explore its multiple aspects.


Music Metadata

Most of the time, let LM auto-infer. Only set manually when you have clear requirements.

ParameterRangeDescription
bpm30–300Slow 60–80, mid 90–120, fast 130–180
keyscaleKeye.g. C Major, Am. Common keys (C, G, D, Am, Em) most stable
timesignatureTime sig4/4 (most common), 3/4 (waltz), 6/8 (swing)
vocal_languageLanguageUsually auto-detected from lyrics
durationSecondsSee duration calculation below

When to Set Manually

ScenarioSet
Daily generationLet LM auto-infer
Clear tempo requirementbpm
Specific style (waltz)timesignature=3/4
Match other materialbpm + duration
Specific key colorkeyscale

Duration Calculation

Estimation Method

  • Intro/Outro: 5-10 seconds each
  • Instrumental sections: 5-15 seconds each
  • Typical structures:
    • 2 verses + 2 choruses: 120-150s minimum
    • 2 verses + 2 choruses + bridge: 180-240s minimum
    • Full song with intro/outro: 210-270s (3.5-4.5 min)

BPM and Duration Relationship

  • Slower BPM (60-80): Need MORE duration for same lyrics
  • Medium BPM (100-130): Standard duration
  • Faster BPM (150-180): Can fit more lyrics, but still need breathing room

Rule of thumb: When in doubt, estimate longer. A song too short feels rushed.


Note: Lyrics tags (piano, powerful, whispered) are consistent with Caption (piano ballad, building to powerful chorus, intimate).