Adding a Frontend¶

Step-by-step guide to adding a new audio frontend mode to BirdNET-STM32.

Overview¶

Audio frontends are defined in birdnet_stm32/models/frontend.py. Each frontend transforms a specific input representation (waveform, STFT, mel spectrogram, etc.) into a fixed-size tensor for the DS-CNN backbone.

Steps¶

1. Register the frontend name¶

Add the canonical name to VALID_FRONTENDS in frontend.py:

VALID_FRONTENDS = ("librosa", "hybrid", "raw", "mfcc", "log_mel", "your_frontend")

2. Implement the frontend branch¶

In the AudioFrontendLayer.call() method, add a branch for your frontend:

elif self.frontend_mode == "your_frontend":
    x = self._your_frontend_ops(inputs)

Keep the implementation as a sequence of standard Keras layers (Conv2D, DepthwiseConv2D, BatchNormalization, ReLU6, etc.) to ensure NPU compatibility.

3. Define the input shape¶

In AudioFrontendLayer.build() or the model builder, specify the expected input shape for your frontend. The model builder in dscnn.py uses this to construct the correct Input layer.

4. Add magnitude scaling support¶

If your frontend produces a spectrogram-like output, apply the MagnitudeScalingLayer after your frontend's feature extraction:

x = self.mag_layer(x)  # PWL/PCEN/dB/none

5. Update the data pipeline¶

In birdnet_stm32/data/dataset.py, add the preprocessing logic for your frontend. The data generator must produce inputs in the shape your frontend expects.

6. N6 compatibility checklist¶

Before merging, verify:

[ ] All ops are in the N6 NPU supported set
[ ] Channel counts are multiples of 8
[ ] Input size does not exceed the 16-bit activation limit (65,536 elements)
[ ] Run stedgeai analyze on the exported TFLite model
[ ] Cosine similarity > 0.95 after PTQ

7. Add tests¶

Create tests/test_frontend_your_frontend.py with:

Output shape test for known input dimensions
Magnitude scaling integration test
Round-trip test: build model → export TFLite → run inference

8. Update documentation¶

Add a section to docs/dev/audio-frontends.md
Update the frontend count in docs/index.md
Update the mermaid diagram in docs/dev/architecture.md

Example: the `mfcc` frontend¶

The mfcc frontend is a good reference for a simple precomputed frontend:

Input: [B, num_mfcc, spec_width, 1] — precomputed offline
In-graph ops: magnitude scaling only
No trainable parameters in the frontend itself
Data pipeline computes MFCCs using librosa.feature.mfcc()

For an in-graph frontend (like hybrid or raw), the implementation is more involved because the feature extraction layers must be NPU-compatible.