spectrogram
birdnet_stm32.audio.spectrogram
¶
Spectrogram computation, magnitude scaling, and normalization.
Supports mel spectrograms, linear STFT, and multiple magnitude compression modes (none, pwl, pcen, db). All scaling is designed to be quantization-friendly for INT8 deployment on the STM32N6 NPU.
normalize(S)
¶
Normalize a spectrogram to [0, 1] per sample.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
S
|
ndarray
|
Spectrogram array. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Normalized spectrogram, same shape as input. |
Source code in birdnet_stm32/audio/spectrogram.py
get_spectrogram_from_audio(audio, sample_rate=22050, n_fft=512, mel_bins=64, spec_width=256, mag_scale='none')
¶
Compute a magnitude spectrogram with optional scaling and normalization.
Behavior by mag_scale
- 'none': Magnitude mel (power=1.0), then normalize to [0, 1].
- 'pcen': Magnitude mel, scale to 32-bit PCM range, librosa.pcen, normalize.
- 'pwl': Magnitude mel, pre-normalize, piecewise compression, normalize.
- 'db': Magnitude mel, amplitude_to_db(ref=max), normalize.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
audio
|
ndarray
|
1D audio array (mono). |
required |
sample_rate
|
int
|
Sampling rate (Hz). |
22050
|
n_fft
|
int
|
FFT size for STFT. |
512
|
mel_bins
|
int
|
Number of mel bands, or <=0 for linear STFT bins (magnitude). |
64
|
spec_width
|
int
|
Target number of time frames (columns). |
256
|
mag_scale
|
str
|
'none' | 'db' | 'pcen' | 'pwl'. |
'none'
|
Returns:
| Type | Description |
|---|---|
ndarray
|
Spectrogram array (mel_bins or fft_bins, spec_width), values in [0, 1]. |