--- name: audio-research description: Audio retrieval, generation, and preprocessing for academic/creative research — YouTube OST extraction, dependency-free synthesis, FFmpeg trimming to clip length. triggered_by: audio, music, BGM, soundtrack, 背景音, 配乐, 生成音频, sound effect, WAV, MP3 category: creative --- # Audio Research Workflow ## 适用场景 - 从 YouTube 官方频道下载游戏/影视 OST 片段 - 学术研究需要特定时长（40秒等）的音频样本 - 无外部依赖（numpy/scipy 未安装）时生成白噪音/海浪/环境音 - 将 m4a/youtube 音频转换为指定长度的 MP3 ## 工具链 1. **yt-dlp** — YouTube/YouTube Music 音频提取 2. **FFmpeg** — 格式转换、时长裁剪、MP3 编码 3. **Python stdlib** — 无依赖音频合成（wave, struct, math, random） --- ## 完整流程：YouTube OST → 40秒 MP3 ### Step 1 — 下载官方音频（m4a / bestaudio） ```bash mkdir -p ~/Music/<项目名> cd ~/Music/<项目名> yt-dlp -f "bestaudio[ext=m4a]" -o "%(title)s.%(ext)s" --max-filesize 50M \ "https://www.youtube.com/watch?v=" \ "https://www.youtube.com/watch?v=" ``` - 优先使用官方频道视频（游戏官方/Taobao Music 官方等） - 优先选择 `bestaudio[ext=m4a]` 格式（音质好，体积适中） - `--max-filesize 50M` 避免下载过大文件 ### Step 2 — 检查时长 ```bash for f in *.m4a; do echo "$f: $(ffprobe -v quiet -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 "$f")s" done ``` ### Step 3 — 裁剪并转换为 MP3 **完整曲目取前40秒：** ```bash ffmpeg -y -i "输入.m4a" -t 40 -codec:a libmp3lame -qscale:a 2 "40s_输出.mp3" ``` **短曲目（<60秒）取中间40秒：** ```bash ffmpeg -y -i "输入.m4a" -ss 12 -t 40 -codec:a libmp3lame -qscale:a 2 "40s_输出.mp3" ``` **关键参数说明：** - `-t 40` — 持续40秒 - `-ss 12` — 从第12秒开始（用于64秒短曲目，取中间段） - `-codec:a libmp3lame` — MP3 编码器 - `-qscale:a 2` — 高质量 MP3（VBR ~190kbps） ### Step 4 — 通过 Telegram 发送音频 ``` send_message(target="telegram:", message="MEDIA:/absolute/path/to/file.mp3") ``` - 直接在 message 里写 `MEDIA:/绝对路径`，Telegram 会作为语音/音频附件发送 - 不要把文件路径写在普通文本消息里——会变成可点击链接而不是附件 - 多文件分多条消息发，每条一条 `MEDIA:` 指令 ### 备选流程：TTS 输出转 MP3 MiniMax TTS 返回 `.ogg`（opus 编码），需要转 MP3 再发送： ```bash ffmpeg -y -i 输入.ogg -acodec libmp3lame -q:a 2 输出.mp3 ``` - `-acodec libmp3lame` — MP3 编码器 - `-q:a 2` — 高质量 VBR ~190kbps - 转换完成后用 `ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1` 验证时长 --- ## 批量并行下载（多部作品 OST）多个 YouTube 源可并行下载，全部完成后再批量转换： ```bash # 并行下载（3个同时） yt-dlp -f bestaudio "https://www.youtube.com/watch?v=ID1" -o "track1_full.%(ext)s" & yt-dlp -f bestaudio "https://www.youtube.com/watch?v=ID2" -o "track2_full.%(ext)s" & yt-dlp -f bestaudio "https://www.youtube.com/watch?v=ID3" -o "track3_full.%(ext)s" & wait # 批量转换 + 裁剪（前60秒） for f in track*_full.webm; do ffmpeg -y -i "$f" -ss 0 -t 60 -acodec libmp3lame -q:a 2 "${f%_full.webm}_60s.mp3" done ``` - YouTube playlist 下载：`yt-dlp --yes-playlist "https://www.youtube.com/playlist?list=..."` - 单独曲目下载：`yt-dlp -f bestaudio "https://www.youtube.com/watch?v=..."` - `bestaudio` 会自动选最佳音质格式（通常是 webm/opus） - FFmpeg `-ss 0 -t 60` 从头开始截取60秒 - MP3 质量：`-q:a 2` = VBR ~190kbps，效果好且文件不大（60秒约1.5MB） - `ffprobe -v error -show_entries format=duration,size -of default=noprint_wrappers=1 "file.mp3"` 验证时长 ## 备选流程：无依赖环境音合成当 numpy/scipy 不可用时，用 Python 标准库生成环境音频。 ### 海浪/白噪音（20秒，22.05kHz 单声道） ```python import wave, struct, math, random sample_rate = 22050 duration = 20 num_samples = sample_rate * duration with wave.open('/tmp/ocean_waves.wav', 'w') as wav: wav.setnchannels(1) wav.setsampwidth(2) wav.setframerate(sample_rate) for i in range(num_samples): t = i / sample_rate n = random.uniform(-1, 1) # 多层缓慢调幅，模拟海浪节奏 mod1 = 0.4 * math.sin(2 * math.pi * 0.12 * t) + 0.5 mod2 = 0.3 * math.sin(2 * math.pi * 0.07 * t + 1.2) + 0.4 mod3 = 0.2 * math.sin(2 * math.pi * 0.15 * t + 2.5) + 0.3 val = n * 0.4 * mod1 + n * 0.3 * mod2 + n * 0.15 * mod3 sample = int(max(-32767, min(32767, val * 32767 * 0.7))) wav.writeframes(struct.pack('