Adding a Frontend¶
Step-by-step guide to adding a new audio frontend mode to BirdNET-STM32.
Overview¶
Audio frontends are defined in birdnet_stm32/models/frontend.py. Each
frontend transforms a specific input representation (waveform, STFT, mel
spectrogram, etc.) into a fixed-size tensor for the DS-CNN backbone.
Steps¶
1. Register the frontend name¶
Add the canonical name to VALID_FRONTENDS in frontend.py:
2. Implement the frontend branch¶
In the AudioFrontendLayer.call() method, add a branch for your frontend:
Keep the implementation as a sequence of standard Keras layers (Conv2D, DepthwiseConv2D, BatchNormalization, ReLU6, etc.) to ensure NPU compatibility.
3. Define the input shape¶
In AudioFrontendLayer.build() or the model builder, specify the expected
input shape for your frontend. The model builder in dscnn.py uses this to
construct the correct Input layer.
4. Add magnitude scaling support¶
If your frontend produces a spectrogram-like output, apply the
MagnitudeScalingLayer after your frontend's feature extraction:
5. Update the data pipeline¶
In birdnet_stm32/data/dataset.py, add the preprocessing logic for your
frontend. The data generator must produce inputs in the shape your frontend
expects.
6. N6 compatibility checklist¶
Before merging, verify:
- [ ] All ops are in the N6 NPU supported set
- [ ] Channel counts are multiples of 8
- [ ] Input size does not exceed the 16-bit activation limit (65,536 elements)
- [ ] Run
stedgeai analyzeon the exported TFLite model - [ ] Cosine similarity > 0.95 after PTQ
7. Add tests¶
Create tests/test_frontend_your_frontend.py with:
- Output shape test for known input dimensions
- Magnitude scaling integration test
- Round-trip test: build model → export TFLite → run inference
8. Update documentation¶
- Add a section to
docs/dev/audio-frontends.md - Update the frontend count in
docs/index.md - Update the mermaid diagram in
docs/dev/architecture.md
Example: the mfcc frontend¶
The mfcc frontend is a good reference for a simple precomputed frontend:
- Input:
[B, num_mfcc, spec_width, 1]— precomputed offline - In-graph ops: magnitude scaling only
- No trainable parameters in the frontend itself
- Data pipeline computes MFCCs using
librosa.feature.mfcc()
For an in-graph frontend (like hybrid or raw), the implementation is more
involved because the feature extraction layers must be NPU-compatible.