spectrogram
birdnet_stm32.audio.spectrogram
¶
Spectrogram computation, magnitude scaling, and normalization.
Supports mel spectrograms, MFCC, linear STFT, and multiple magnitude compression modes (none, pwl, pcen, db). All scaling is designed to be quantization-friendly for INT8 deployment on the STM32N6 NPU.
normalize(S)
¶
Normalize a spectrogram to [0, 1] per sample.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
S
|
ndarray
|
Spectrogram array. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Normalized spectrogram, same shape as input. |
Source code in birdnet_stm32/audio/spectrogram.py
get_spectrogram_from_audio(audio, sample_rate=24000, n_fft=512, mel_bins=64, spec_width=256, mag_scale='none', mode='mel', n_mfcc=20)
¶
Compute a magnitude spectrogram with optional scaling and normalization.
Modes
- 'mel': Standard mel spectrogram.
- 'mfcc': Mel-frequency cepstral coefficients (mel -> DCT -> truncate).
- 'log_mel': Log-scaled mel spectrogram (log1p, quantization-friendly).
- 'linear': Linear STFT magnitude (when mel_bins <= 0).
Behavior by mag_scale (applied only in 'mel' and 'linear' modes): - 'none': Magnitude mel (power=1.0), then normalize to [0, 1]. - 'pcen': Magnitude mel, scale to 32-bit PCM range, librosa.pcen, normalize. - 'pwl': Magnitude mel, pre-normalize, piecewise compression, normalize. - 'db': Magnitude mel, amplitude_to_db(ref=max), normalize.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
audio
|
ndarray
|
1D audio array (mono). |
required |
sample_rate
|
int
|
Sampling rate (Hz). |
24000
|
n_fft
|
int
|
FFT size for STFT. |
512
|
mel_bins
|
int
|
Number of mel bands, or <=0 for linear STFT bins (magnitude). |
64
|
spec_width
|
int
|
Target number of time frames (columns). |
256
|
mag_scale
|
str
|
'none' | 'db' | 'pcen' | 'pwl'. |
'none'
|
mode
|
str
|
'mel' | 'mfcc' | 'log_mel' | 'linear'. |
'mel'
|
n_mfcc
|
int
|
Number of MFCC coefficients (only used when mode='mfcc'). |
20
|
Returns:
| Type | Description |
|---|---|
ndarray
|
Spectrogram array (mel_bins or n_mfcc or fft_bins, spec_width), values in [0, 1]. |
Source code in birdnet_stm32/audio/spectrogram.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 | |