Назад към всички

Audio

// Process, enhance, and convert audio files with noise removal, normalization, format conversion, transcription, and podcast workflows.

$ git log --oneline --stat
stars:1,933
forks:367
updated:March 4, 2026
SKILL.mdreadonly
SKILL.md Frontmatter
nameAudio
slugaudio
version1.0.1
descriptionProcess, enhance, and convert audio files with noise removal, normalization, format conversion, transcription, and podcast workflows.
changelogDeclare required binaries (ffmpeg, ffprobe), add requirements section with optional deps, add explicit scope
metadata[object Object]

Requirements

Required:

  • ffmpeg / ffprobe — core audio processing

Optional (for advanced features):

  • sox — additional noise reduction
  • whisper — local transcription (or use API)
  • demucs — stem separation

Quick Reference

SituationLoad
FFmpeg commands by taskcommands.md
Loudness standards by platformloudness.md
Podcast production workflowpodcast.md
Transcription workflowtranscription.md

Core Capabilities

TaskMethod
Convert formatsFFmpeg (-acodec)
Remove noiseFFmpeg filters or SoX
Normalize loudnessffmpeg-normalize or -af loudnorm
TranscribeWhisper → text, SRT, VTT
Separate stemsDemucs (vocals, drums, bass, other)

Execution Pattern

  1. Clarify goal — What format? What loudness? What platform?
  2. Analyze sourceffprobe for codec, sample rate, channels, duration
  3. Process — FFmpeg/SoX for transformation
  4. Verify — Check output plays, meets specs, sounds correct
  5. Deliver — Provide file to user

Common Requests → Actions

User saysAgent does
"Convert to MP3"-acodec libmp3lame -q:a 2
"Remove background noise"Apply highpass/lowpass or dedicated denoiser
"Normalize for podcast"-af loudnorm=I=-16:TP=-1.5:LRA=11
"Transcribe this"Whisper → output SRT/VTT/TXT
"Extract audio from video"-vn -acodec copy or re-encode
"Make it smaller"Lower bitrate: -b:a 128k or -b:a 96k
"Speed up 1.5x"-af atempo=1.5

Format Quick Reference

FormatUse CaseQuality
WAVMaster, editingLossless
FLACArchive, audiophileLossless compressed
MP3Universal sharingLossy, 128-320 kbps
AAC/M4AApple, podcastsLossy, efficient
OGG/OpusWhatsApp, DiscordLossy, very efficient

Quality Defaults

  • Podcast: -16 LUFS (Spotify), -19 LUFS (Apple)
  • Music: -14 LUFS (Spotify), -16 LUFS (Apple Music)
  • MP3 quality: VBR -q:a 2 (~190 kbps) or CBR -b:a 192k
  • Sample rate: 44.1kHz for music, 48kHz for video sync

Scope

This skill:

  • Processes audio files user explicitly provides
  • Runs FFmpeg commands on user request
  • Does NOT access cloud services without user knowing
  • Does NOT store files persistently (user manages their files)