Quantization¶
Strategy¶
BirdNET-STM32 uses post-training quantization (PTQ) to convert trained Keras models to INT8 TFLite for the STM32N6 NPU.
| Aspect | Choice | Rationale |
|---|---|---|
| Weight precision | INT8 | Required by N6 NPU |
| Activation precision | INT8 | Required by N6 NPU |
| I/O precision | Float32 | Audio inputs are continuous-valued; INT8 I/O destroys precision |
| Calibration | Representative dataset | 1024 samples from training data |
QAT (quantization-aware training)¶
BirdNET-STM32 also supports quantization-aware training (QAT) as an
optional fine-tuning step (--qat). QAT injects INT8 quantization noise into
weights during training so the model learns to tolerate quantization error.
The implementation uses shadow-weight fake-quantization
(birdnet_stm32/training/qat.py):
- Freeze BatchNorm layers (running statistics are kept).
- Before each forward pass, fake-quantize Conv2D / DepthwiseConv2D / Dense kernel weights to INT8 range (per-channel).
- Train with a low learning rate (typically 1e-4) for a few epochs.
- Save the model with original float32 weights — no FakeQuant ops remain.
Because no FakeQuant nodes are saved, the resulting .keras model is fully
compatible with the STM32N6 NPU after standard PTQ conversion.
python -m birdnet_stm32 train --data_path_train data/train \
--qat --checkpoint_path checkpoints/best_model.keras \
--epochs 10 --learning_rate 0.0001
When to use QAT
Use QAT when PTQ cosine similarity is below 0.95 despite trying PWL magnitude scaling and adjusting the representative dataset. QAT typically recovers 1–3% accuracy lost during quantization.
Representative dataset¶
The calibration dataset is critical for PTQ quality:
- Source: randomly sampled training files, center-cropped to chunk duration.
- Size: 1024 samples (default). More is not necessarily better.
- Diversity: moderate diversity is ideal. Overly diverse datasets widen INT8 quantization ranges, reducing precision.
- Target: cosine similarity > 0.95 between Keras and TFLite outputs.
Cosine similarity troubleshooting¶
| Symptom | Likely cause | Fix |
|---|---|---|
| Cosine sim < 0.90 | db magnitude scaling |
Switch to pwl |
| Cosine sim 0.90–0.95 | Too-diverse representative set | Reduce --num_samples or filter by SNR |
| Cosine sim varies across runs | Non-deterministic data order | Set --deterministic (when available) |
| stedgeai analyze fails | Unsupported op in model | Check operator, simplify model |
Channel alignment¶
The N6 NPU vectorizes computation in groups of 8 channels. Misaligned channel counts either:
- Waste compute cycles (padding to next multiple of 8)
- Fail compilation entirely
The model builder enforces alignment via _make_divisible(channels, 8). When
adding new layers or architectures, always maintain this constraint.
Validation workflow¶
After conversion, always follow this sequence:
flowchart LR
A[".keras model"] --> B["birdnet_stm32 convert\nPTQ → .tflite"]
B --> C{"Cosine sim\n> 0.95?"}
C -->|Yes| D["stedgeai analyze\nN6 compatibility"]
C -->|No| E["Adjust rep dataset\nor mag scaling"]
E --> B
D --> F{"All ops\nsupported?"}
F -->|Yes| G["stedgeai validate\non-device"]
F -->|No| H["Simplify model\nor remove op"]
H --> B