activity
birdnet_stm32.audio.activity
¶
Activity detection and signal-to-noise sorting for audio chunks.
Provides heuristics to rank audio chunks by signal content and filter out low-activity (silent or noise-only) segments before training or evaluation. Includes a smart crop strategy for weakly-labeled long recordings that identifies the most salient segments using short-time energy analysis.
smart_crop(audio, sample_rate, chunk_duration, max_chunks=5, energy_percentile=75.0)
¶
Extract the most salient chunks from a long audio recording.
Uses short-time energy (STE) to identify regions with the highest vocal activity. This is critical for weakly-labeled recordings where a long file may contain sparse vocalizations mixed with silence or background noise.
Strategy
- Compute STE profile over the entire recording.
- Set a threshold at the given percentile of the energy distribution.
- Find contiguous regions above the threshold.
- For each region, extract one chunk centered on the energy peak.
- Return up to max_chunks chunks, ranked by peak energy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
audio
|
ndarray
|
1D float32 audio signal (mono, pre-normalized). |
required |
sample_rate
|
int
|
Sampling rate (Hz). |
required |
chunk_duration
|
float
|
Desired chunk length (seconds). |
required |
max_chunks
|
int
|
Maximum number of chunks to return. |
5
|
energy_percentile
|
float
|
Percentile of STE used as activity threshold (higher = stricter, keeps only the loudest regions). |
75.0
|
Returns:
| Type | Description |
|---|---|
list[ndarray]
|
List of 1D float32 audio chunks, sorted by descending energy. |
list[ndarray]
|
Falls back to a single center crop if no salient region is found. |
Source code in birdnet_stm32/audio/activity.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 | |
get_s2n_from_spectrogram(spectrogram)
¶
Compute a simple signal-to-noise proxy from a spectrogram (mean / std).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spectrogram
|
ndarray
|
2D spectrogram array. |
required |
Returns:
| Type | Description |
|---|---|
float
|
SNR-like scalar value. |
Source code in birdnet_stm32/audio/activity.py
get_s2n_from_audio(audio)
¶
Compute a simple signal-to-noise proxy from raw audio (mean / std).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
audio
|
ndarray
|
1D audio signal. |
required |
Returns:
| Type | Description |
|---|---|
float
|
SNR-like scalar value. |
Source code in birdnet_stm32/audio/activity.py
sort_by_s2n(samples, threshold=0.1)
¶
Sort samples by SNR proxy and filter out low-SNR ones. Keeps at least one.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
samples
|
list[ndarray]
|
List of 2D spectrograms or 1D audio arrays. |
required |
threshold
|
float
|
Minimum normalized SNR to keep (in [0, 1]). |
0.1
|
Returns:
| Type | Description |
|---|---|
list[ndarray]
|
Sorted (descending by SNR) and filtered samples. |
Source code in birdnet_stm32/audio/activity.py
get_activity_ratio(x, k=2.0, max_active=0.8, subsample=512)
¶
Compute the fraction of units above median + k * MAD, capped to avoid broadband noise.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
ndarray
|
1D or 2D array (audio or spectrogram). |
required |
k
|
float
|
MAD multiplier for threshold. |
2.0
|
max_active
|
float
|
Max allowed fraction of active units (returns 0.0 if exceeded). |
0.8
|
subsample
|
int
|
Number of points to use for median/MAD computation. |
512
|
Returns:
| Type | Description |
|---|---|
float
|
Activity ratio in [0, 1]. |
Source code in birdnet_stm32/audio/activity.py
sort_by_activity(samples, threshold=0.25)
¶
Sort samples by activity ratio and filter low-activity ones. Keeps at least one.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
samples
|
list[ndarray]
|
List of 1D or 2D arrays. |
required |
threshold
|
float
|
Minimum activity ratio to keep. |
0.25
|
Returns:
| Type | Description |
|---|---|
list[ndarray]
|
Sorted and filtered samples. |
Source code in birdnet_stm32/audio/activity.py
pick_random_samples(samples, num_samples=1, pick_first=False)
¶
Randomly select one or more samples from a list.
When pick_first=True and num_samples > 1, the first sample is
always included and the remaining are drawn randomly from the rest.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
samples
|
list[ndarray]
|
List of samples (spectrograms or raw audio). |
required |
num_samples
|
int
|
Number of samples to select. |
1
|
pick_first
|
bool
|
If True and num_samples == 1, always return the first sample. If True and num_samples > 1, include the first sample plus random picks. |
False
|
Returns:
| Type | Description |
|---|---|
list[ndarray] | ndarray
|
Selected samples. A list if num_samples > 1, otherwise a single ndarray. |