Dataset Preparation¶
Folder structure¶
BirdNET-STM32 expects audio files organized by species:
data/
├── train/
│ ├── species_name_1/
│ │ ├── file1.wav
│ │ ├── file2.wav
│ │ └── ...
│ ├── species_name_2/
│ └── ...
└── test/
├── species_name_1/
├── species_name_2/
└── ...
Each subfolder name becomes a class label. All audio files must be .wav format.
Special class names¶
Folders named noise, silence, background, or other are treated as
negative classes — they receive all-zero label vectors during training. Use
these to improve robustness against non-bird audio.
Downloading iNatSounds¶
We use a subset of the iNatSounds 2024 dataset. After downloading, sort files into species folders based on the train and test annotation CSVs.
Since the model is small, you typically train on a subset of species. Species
lists for various regions are available in dev/:
| File | Region |
|---|---|
species_list_eu.txt |
Central Europe |
species_list_CA.txt |
California |
species_list_USE.txt |
Eastern US |
species_list_USW.txt |
Western US |
species_list_brazil.txt |
Brazil |
species_list_sea.txt |
Southeast Asia |
species_list_australia.txt |
Australia |
species_list_africa.txt |
Sub-Saharan Africa |
species_list_combined.txt |
Combined subset |
Data pipeline details¶
During training, the data pipeline:
- Discovers all
.wavfiles underdata/train/<class>/. - Upsamples minority classes to a configurable ratio (
--upsample_ratio, default 0.5) of the largest class. - Caps files per class if
--max_samplesis set. - Chunks each file into fixed-length segments (
--chunk_duration, default 3 seconds) up to--max_duration(default 30 seconds). - Computes spectrograms according to the selected
--audio_frontend. - Splits into train/validation (
--val_split, default 0.2).
Tips¶
- Aim for at least 50–100 files per species for reasonable training.
- Longer files contribute more chunks — balance file counts, not total duration.
- Add noise/background folders to make the model more robust in the field.
- The
--max_samplesflag is useful for quick experiments with balanced class counts.