augmentation
birdnet_stm32.audio.augmentation
¶
Data augmentation for audio and spectrograms.
Implements mixup (uniform or Beta distribution) and SpecAugment (frequency/time masking).
apply_mixup(batch_samples, batch_labels, alpha=0.2, probability=0.25, use_beta=False, label_smoothing=0.0)
¶
Apply mixup augmentation to a batch of samples and labels.
A fraction of samples in the batch are mixed with random partners. Audio is blended by a weighted average; labels are merged via element-wise max (multi-label OR).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
batch_samples
|
ndarray
|
Input batch [B, ...]. |
required |
batch_labels
|
ndarray
|
One-hot labels [B, C]. |
required |
alpha
|
float
|
Mixup strength. For uniform: range [alpha, 1 - alpha]. For Beta: concentration parameter (Beta(alpha, alpha)). |
0.2
|
probability
|
float
|
Fraction of the batch to apply mixup to. |
0.25
|
use_beta
|
bool
|
If True, sample lambda from Beta(alpha, alpha) instead of uniform. Beta distribution provides more diversity in mixing ratios. |
False
|
label_smoothing
|
float
|
If > 0, smooth labels after mixup by
|
0.0
|
Returns:
| Type | Description |
|---|---|
tuple[ndarray, ndarray]
|
Tuple of (mixed_samples, mixed_labels) with same shapes as inputs. |
Source code in birdnet_stm32/audio/augmentation.py
apply_spec_augment(spectrogram, freq_mask_max=8, time_mask_max=25, num_freq_masks=2, num_time_masks=2)
¶
Apply SpecAugment (frequency and time masking) to a spectrogram.
Zeroes out random contiguous bands along the frequency and time axes. Operates on a single spectrogram of shape [F, T] or [F, T, 1].
Reference: Park et al., "SpecAugment", 2019.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spectrogram
|
ndarray
|
Input spectrogram [F, T] or [F, T, 1]. |
required |
freq_mask_max
|
int
|
Maximum width of each frequency mask (bins). |
8
|
time_mask_max
|
int
|
Maximum width of each time mask (frames). |
25
|
num_freq_masks
|
int
|
Number of frequency masks to apply. |
2
|
num_time_masks
|
int
|
Number of time masks to apply. |
2
|
Returns:
| Type | Description |
|---|---|
ndarray
|
Augmented spectrogram with same shape as input. |