augmentation
birdnet_stm32.audio.augmentation
¶
Data augmentation for audio and spectrograms.
Implements mixup (multi-source additive mixing for soundscape realism) and SpecAugment (frequency/time masking).
apply_mixup(batch_samples, batch_labels, alpha=0.2, probability=0.25, label_smoothing=0.0)
¶
Apply realistic multi-source mixup to a batch of samples and labels.
Emulates natural soundscapes with multiple birds vocalizing at the same time. Instead of a single Beta-distributed lambda that biases toward one source, this draws mixing gains from a Dirichlet distribution so each source contributes a meaningful proportion. Each mixed sample blends 2–3 sources (randomly chosen), and labels are merged via element-wise max (multi-label union) since all source species are present.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
batch_samples
|
ndarray
|
Input batch [B, ...]. |
required |
batch_labels
|
ndarray
|
One-hot labels [B, C]. |
required |
alpha
|
float
|
Dirichlet concentration parameter. Lower values produce more
varied gain distributions; higher values produce more uniform
mixing (all sources contribute equally). |
0.2
|
probability
|
float
|
Fraction of the batch to apply mixup to. |
0.25
|
label_smoothing
|
float
|
If > 0, smooth labels after mixup by
|
0.0
|
Returns:
| Type | Description |
|---|---|
tuple[ndarray, ndarray]
|
Tuple of (mixed_samples, mixed_labels) with same shapes as inputs. |
Source code in birdnet_stm32/audio/augmentation.py
apply_spec_augment(spectrogram, freq_mask_max=8, time_mask_max=25, num_freq_masks=2, num_time_masks=2)
¶
Apply SpecAugment (frequency and time masking) to a spectrogram.
Zeroes out random contiguous bands along the frequency and time axes. Operates on a single spectrogram of shape [F, T] or [F, T, 1].
Reference: Park et al., "SpecAugment", 2019.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spectrogram
|
ndarray
|
Input spectrogram [F, T] or [F, T, 1]. |
required |
freq_mask_max
|
int
|
Maximum width of each frequency mask (bins). |
8
|
time_mask_max
|
int
|
Maximum width of each time mask (frames). |
25
|
num_freq_masks
|
int
|
Number of frequency masks to apply. |
2
|
num_time_masks
|
int
|
Number of time masks to apply. |
2
|
Returns:
| Type | Description |
|---|---|
ndarray
|
Augmented spectrogram with same shape as input. |