Dataset Preparation¶

Folder structure¶

BirdNET-STM32 expects audio files organized by species:

data/
├── train/
│   ├── species_name_1/
│   │   ├── file1.wav
│   │   ├── file2.wav
│   │   └── ...
│   ├── species_name_2/
│   └── ...
└── test/
    ├── species_name_1/
    ├── species_name_2/
    └── ...

Each subfolder name becomes a class label. All audio files must be .wav format.

Special class names¶

Folders named noise, silence, background, or other are treated as negative classes — they receive all-zero label vectors during training. Use these to improve robustness against non-bird audio.

Downloading iNatSounds¶

We use a subset of the iNatSounds 2024 dataset. After downloading, sort files into species folders based on the train and test annotation CSVs.

Since the model is small, you typically train on a subset of species. Species lists for various regions are available in dev/:

File	Region
`species_list_eu.txt`	Central Europe
`species_list_CA.txt`	California
`species_list_USE.txt`	Eastern US
`species_list_USW.txt`	Western US
`species_list_brazil.txt`	Brazil
`species_list_sea.txt`	Southeast Asia
`species_list_australia.txt`	Australia
`species_list_africa.txt`	Sub-Saharan Africa
`species_list_combined.txt`	Combined subset

Data pipeline details¶

During training, the data pipeline:

Discovers all .wav files under data/train/<class>/.
Upsamples minority classes to a configurable ratio (--upsample_ratio, default 0.5) of the largest class.
Caps files per class if --max_samples is set.
Chunks each file into fixed-length segments (--chunk_duration, default 3 seconds) up to --max_duration (default 30 seconds).
Computes spectrograms according to the selected --audio_frontend.
Splits into train/validation (--val_split, default 0.2).

Tips¶

Aim for at least 50–100 files per species for reasonable training.
Longer files contribute more chunks — balance file counts, not total duration.
Add noise/background folders to make the model more robust in the field.
The --max_samples flag is useful for quick experiments with balanced class counts.