activity

`birdnet_stm32.audio.activity` ¶

Activity detection and signal-to-noise sorting for audio chunks.

Provides heuristics to rank audio chunks by signal content and filter out low-activity (silent or noise-only) segments before training or evaluation. Includes a smart crop strategy for weakly-labeled long recordings that identifies the most salient segments using short-time energy analysis.

`smart_crop(audio, sample_rate, chunk_duration, max_chunks=5, energy_percentile=75.0)` ¶

Extract the most salient chunks from a long audio recording.

Uses short-time energy (STE) to identify regions with the highest vocal activity. This is critical for weakly-labeled recordings where a long file may contain sparse vocalizations mixed with silence or background noise.

Strategy

Compute STE profile over the entire recording.
Set a threshold at the given percentile of the energy distribution.
Find contiguous regions above the threshold.
For each region, extract one chunk centered on the energy peak.
Return up to max_chunks chunks, ranked by peak energy.

Parameters:

Name	Type	Description	Default
`audio`	`ndarray`	1D float32 audio signal (mono, pre-normalized).	required
`sample_rate`	`int`	Sampling rate (Hz).	required
`chunk_duration`	`float`	Desired chunk length (seconds).	required
`max_chunks`	`int`	Maximum number of chunks to return.	`5`
`energy_percentile`	`float`	Percentile of STE used as activity threshold (higher = stricter, keeps only the loudest regions).	`75.0`

Returns:

Type	Description
`list[ndarray]`	List of 1D float32 audio chunks, sorted by descending energy.
`list[ndarray]`	Falls back to a single center crop if no salient region is found.

Source code in birdnet_stm32/audio/activity.py

def smart_crop(
    audio: np.ndarray,
    sample_rate: int,
    chunk_duration: float,
    max_chunks: int = 5,
    energy_percentile: float = 75.0,
) -> list[np.ndarray]:
    """Extract the most salient chunks from a long audio recording.

    Uses short-time energy (STE) to identify regions with the highest
    vocal activity.  This is critical for weakly-labeled recordings where
    a long file may contain sparse vocalizations mixed with silence or
    background noise.

    Strategy:
        1. Compute STE profile over the entire recording.
        2. Set a threshold at the given percentile of the energy distribution.
        3. Find contiguous regions above the threshold.
        4. For each region, extract one chunk centered on the energy peak.
        5. Return up to *max_chunks* chunks, ranked by peak energy.

    Args:
        audio: 1D float32 audio signal (mono, pre-normalized).
        sample_rate: Sampling rate (Hz).
        chunk_duration: Desired chunk length (seconds).
        max_chunks: Maximum number of chunks to return.
        energy_percentile: Percentile of STE used as activity threshold
            (higher = stricter, keeps only the loudest regions).

    Returns:
        List of 1D float32 audio chunks, sorted by descending energy.
        Falls back to a single center crop if no salient region is found.
    """
    chunk_size = int(sample_rate * chunk_duration)
    n = audio.shape[0]

    if n <= chunk_size:
        # File shorter than one chunk — pad and return
        padded = np.pad(audio, (0, max(0, chunk_size - n)))
        return [padded[:chunk_size].astype(np.float32)]

    # Compute STE
    frame_len = min(1024, chunk_size // 4)
    hop = frame_len // 2
    ste = _short_time_energy(audio, frame_length=frame_len, hop_length=hop)

    if ste.max() < 1e-10:
        # Silent recording — return center crop
        mid = n // 2
        start = max(0, mid - chunk_size // 2)
        return [audio[start : start + chunk_size].astype(np.float32)]

    threshold = np.percentile(ste, energy_percentile)
    above = ste >= threshold

    # Find contiguous active regions
    regions: list[tuple[int, int]] = []
    in_region = False
    region_start = 0
    for i, val in enumerate(above):
        if val and not in_region:
            region_start = i
            in_region = True
        elif not val and in_region:
            regions.append((region_start, i))
            in_region = False
    if in_region:
        regions.append((region_start, len(above)))

    if not regions:
        # Fallback: center crop
        mid = n // 2
        start = max(0, mid - chunk_size // 2)
        return [audio[start : start + chunk_size].astype(np.float32)]

    # For each region, find the peak energy frame and center a chunk there
    candidates: list[tuple[float, int]] = []
    for rs, re in regions:
        peak_frame = rs + int(np.argmax(ste[rs:re]))
        peak_sample = peak_frame * hop
        start = max(0, min(peak_sample - chunk_size // 2, n - chunk_size))
        peak_energy = float(ste[peak_frame])
        candidates.append((peak_energy, start))

    # Sort by energy (descending) and deduplicate overlapping chunks
    candidates.sort(key=lambda x: x[0], reverse=True)
    selected_starts: list[int] = []
    for _energy, start in candidates:
        # Skip if too close to an already-selected chunk
        if any(abs(start - s) < chunk_size // 2 for s in selected_starts):
            continue
        selected_starts.append(start)
        if len(selected_starts) >= max_chunks:
            break

    chunks = [audio[s : s + chunk_size].astype(np.float32) for s in selected_starts]
    return chunks if chunks else [audio[:chunk_size].astype(np.float32)]

`get_s2n_from_spectrogram(spectrogram)` ¶

Compute a simple signal-to-noise proxy from a spectrogram (mean / std).

Parameters:

Name	Type	Description	Default
`spectrogram`	`ndarray`	2D spectrogram array.	required

Returns:

Type	Description
`float`	SNR-like scalar value.

Source code in birdnet_stm32/audio/activity.py

def get_s2n_from_spectrogram(spectrogram: np.ndarray) -> float:
    """Compute a simple signal-to-noise proxy from a spectrogram (mean / std).

    Args:
        spectrogram: 2D spectrogram array.

    Returns:
        SNR-like scalar value.
    """
    signal = np.mean(spectrogram)
    noise = np.std(spectrogram)
    return signal / (noise + 1e-10)

`get_s2n_from_audio(audio)` ¶

Compute a simple signal-to-noise proxy from raw audio (mean / std).

Parameters:

Name	Type	Description	Default
`audio`	`ndarray`	1D audio signal.	required

Returns:

Type	Description
`float`	SNR-like scalar value.

Source code in birdnet_stm32/audio/activity.py

def get_s2n_from_audio(audio: np.ndarray) -> float:
    """Compute a simple signal-to-noise proxy from raw audio (mean / std).

    Args:
        audio: 1D audio signal.

    Returns:
        SNR-like scalar value.
    """
    signal = np.mean(audio)
    noise = np.std(audio)
    return signal / (noise + 1e-10)

`sort_by_s2n(samples, threshold=0.1)` ¶

Sort samples by SNR proxy and filter out low-SNR ones. Keeps at least one.

Parameters:

Name	Type	Description	Default
`samples`	`list[ndarray]`	List of 2D spectrograms or 1D audio arrays.	required
`threshold`	`float`	Minimum normalized SNR to keep (in [0, 1]).	`0.1`

Returns:

Type	Description
`list[ndarray]`	Sorted (descending by SNR) and filtered samples.

Source code in birdnet_stm32/audio/activity.py

def sort_by_s2n(samples: list[np.ndarray], threshold: float = 0.1) -> list[np.ndarray]:
    """Sort samples by SNR proxy and filter out low-SNR ones. Keeps at least one.

    Args:
        samples: List of 2D spectrograms or 1D audio arrays.
        threshold: Minimum normalized SNR to keep (in [0, 1]).

    Returns:
        Sorted (descending by SNR) and filtered samples.
    """
    if len(samples[0].shape) == 2:
        s2n_values = np.array([get_s2n_from_spectrogram(spec) for spec in samples])
    elif len(samples[0].shape) == 1:
        s2n_values = np.array([get_s2n_from_audio(audio) for audio in samples])
    else:
        raise ValueError("Samples must be 1D or 2D arrays (raw audio or spectrograms).")

    s2n_values /= s2n_values.max() + 1e-10

    sorted_indices = np.argsort(s2n_values)[::-1]
    sorted_samples = [samples[i] for i in sorted_indices]

    filtered = [s for s, v in zip(sorted_samples, s2n_values[sorted_indices], strict=False) if v >= threshold]
    if len(filtered) == 0:
        filtered = [sorted_samples[0]]
    return filtered

`get_activity_ratio(x, k=2.0, max_active=0.8, subsample=512)` ¶

Compute the fraction of units above median + k * MAD, capped to avoid broadband noise.

Parameters:

Name	Type	Description	Default
`x`	`ndarray`	1D or 2D array (audio or spectrogram).	required
`k`	`float`	MAD multiplier for threshold.	`2.0`
`max_active`	`float`	Max allowed fraction of active units (returns 0.0 if exceeded).	`0.8`
`subsample`	`int`	Number of points to use for median/MAD computation.	`512`

Returns:

Type	Description
`float`	Activity ratio in [0, 1].

Source code in birdnet_stm32/audio/activity.py

def get_activity_ratio(x: np.ndarray, k: float = 2.0, max_active: float = 0.8, subsample: int = 512) -> float:
    """Compute the fraction of units above median + k * MAD, capped to avoid broadband noise.

    Args:
        x: 1D or 2D array (audio or spectrogram).
        k: MAD multiplier for threshold.
        max_active: Max allowed fraction of active units (returns 0.0 if exceeded).
        subsample: Number of points to use for median/MAD computation.

    Returns:
        Activity ratio in [0, 1].
    """
    x = np.abs(x)
    flat = x.ravel()
    n = flat.size
    if n > subsample:
        idx = np.linspace(0, n - 1, subsample, dtype=int)
        flat = flat[idx]
    med = np.median(flat)
    mad = np.median(np.abs(flat - med)) + 1e-10
    thresh = med + k * mad
    active = np.count_nonzero(x > thresh)
    total = x.size
    ratio = float(active) / float(total)
    if ratio > max_active:
        return 0.0
    return ratio

`sort_by_activity(samples, threshold=0.25)` ¶

Sort samples by activity ratio and filter low-activity ones. Keeps at least one.

Parameters:

Name	Type	Description	Default
`samples`	`list[ndarray]`	List of 1D or 2D arrays.	required
`threshold`	`float`	Minimum activity ratio to keep.	`0.25`

Returns:

Type	Description
`list[ndarray]`	Sorted and filtered samples.

Source code in birdnet_stm32/audio/activity.py

def sort_by_activity(samples: list[np.ndarray], threshold: float = 0.25) -> list[np.ndarray]:
    """Sort samples by activity ratio and filter low-activity ones. Keeps at least one.

    Args:
        samples: List of 1D or 2D arrays.
        threshold: Minimum activity ratio to keep.

    Returns:
        Sorted and filtered samples.
    """
    activity = np.array([get_activity_ratio(s) for s in samples])
    sorted_idx = np.argsort(activity)[::-1]
    sorted_samples = [samples[i] for i in sorted_idx]
    filtered = [s for s, a in zip(sorted_samples, activity[sorted_idx], strict=False) if a >= threshold]
    if not filtered:
        filtered = [sorted_samples[0]]
    return filtered

`pick_random_samples(samples, num_samples=1, pick_first=False)` ¶

Randomly select one or more samples from a list.

When pick_first=True and num_samples > 1, the first sample is always included and the remaining are drawn randomly from the rest.

Parameters:

Name	Type	Description	Default
`samples`	`list[ndarray]`	List of samples (spectrograms or raw audio).	required
`num_samples`	`int`	Number of samples to select.	`1`
`pick_first`	`bool`	If True and num_samples == 1, always return the first sample. If True and num_samples > 1, include the first sample plus random picks.	`False`

Returns:

Type	Description
`list[ndarray] \| ndarray`	Selected samples. A list if num_samples > 1, otherwise a single ndarray.

Source code in birdnet_stm32/audio/activity.py

def pick_random_samples(
    samples: list[np.ndarray],
    num_samples: int = 1,
    pick_first: bool = False,
) -> list[np.ndarray] | np.ndarray:
    """Randomly select one or more samples from a list.

    When ``pick_first=True`` and ``num_samples > 1``, the first sample is
    always included and the remaining are drawn randomly from the rest.

    Args:
        samples: List of samples (spectrograms or raw audio).
        num_samples: Number of samples to select.
        pick_first: If True and num_samples == 1, always return the first sample.
            If True and num_samples > 1, include the first sample plus random picks.

    Returns:
        Selected samples. A list if num_samples > 1, otherwise a single ndarray.
    """
    if len(samples) == 0:
        return []
    if num_samples > len(samples):
        num_samples = len(samples)

    if pick_first:
        if num_samples == 1:
            return samples[0]
        # Always include first, randomly pick remaining from the rest
        rest_count = min(num_samples - 1, len(samples) - 1)
        if rest_count > 0:
            rest_indices = np.random.choice(len(samples) - 1, size=rest_count, replace=False) + 1
            return [samples[0]] + [samples[i] for i in rest_indices]
        return [samples[0]]

    indices = np.random.choice(len(samples), size=num_samples, replace=False)
    return [samples[i] for i in indices] if num_samples > 1 else samples[indices[0]]

activity

birdnet_stm32.audio.activity ¶

smart_crop(audio, sample_rate, chunk_duration, max_chunks=5, energy_percentile=75.0) ¶

get_s2n_from_spectrogram(spectrogram) ¶

get_s2n_from_audio(audio) ¶

sort_by_s2n(samples, threshold=0.1) ¶

get_activity_ratio(x, k=2.0, max_active=0.8, subsample=512) ¶

sort_by_activity(samples, threshold=0.25) ¶

pick_random_samples(samples, num_samples=1, pick_first=False) ¶

`birdnet_stm32.audio.activity` ¶

`smart_crop(audio, sample_rate, chunk_duration, max_chunks=5, energy_percentile=75.0)` ¶

`get_s2n_from_spectrogram(spectrogram)` ¶

`get_s2n_from_audio(audio)` ¶

`sort_by_s2n(samples, threshold=0.1)` ¶

`get_activity_ratio(x, k=2.0, max_active=0.8, subsample=512)` ¶

`sort_by_activity(samples, threshold=0.25)` ¶

`pick_random_samples(samples, num_samples=1, pick_first=False)` ¶