Inference Pipeline¶
This guide is a concise summary. For the full technical explanation (STFT, PCEN, box-to-time and frequency conversion, and song reconstruction), see detect-birds-internals.md.
Pipeline Steps (Short)¶
- Load and normalize audio (WAV/FLAC/OGG/MP3).
- Resample to target sample rate if needed.
- Compute PCEN features in memory-friendly segments.
- Generate 3-second overlapping clips (50% hop).
- Render clip spectrogram images.
- Run YOLO inference per clip.
- Convert detection coordinates to time and frequency.
- Merge detections with species-aware song reconstruction.
- Save outputs as JSON and/or CSV.
Output Notes¶
Raw detections are typically used for confidence threshold sweeps with --no-merge, while reconstructed
detections are more suitable for direct reporting. The CSV output follows annotation-compatible columns.
For implementation details and callable APIs, see:
detect-birds-internals.md- ../api/inference.md