Model Conversion¶
Convert a trained Keras model to a quantized TFLite model using post-training quantization (PTQ) with INT8 internals and float32 I/O.
Basic usage¶
python -m birdnet_stm32 convert \
--checkpoint_path checkpoints/my_model.keras \
--model_config checkpoints/my_model_model_config.json \
--data_path_train data/train
This produces:
my_model_quantized.tflite— quantized TFLite modelmy_model_quantized_validation_data.npz— validation inputs/outputs for on-device comparison
How it works¶
flowchart TD
A[".keras model"] --> B["Load model\n+ model_config.json"]
B --> C["Build representative\ndataset (1024 samples)"]
C --> D["TFLite PTQ\nfloat32 I/O, INT8 internals"]
D --> E["Validate: Keras vs. TFLite\ncosine sim, MSE, Pearson r"]
E --> F[".tflite + .npz\nfor on-device validation"]
Validation metrics¶
After conversion, the script reports:
| Metric | Target | Description |
|---|---|---|
| Cosine similarity | > 0.95 | Directional agreement of output vectors |
| MSE | Low | Mean squared error |
| MAE | Low | Mean absolute error |
| Pearson r | > 0.95 | Linear correlation |
Cosine similarity < 0.95
If cosine similarity drops below 0.95, the quantized model may behave significantly differently from the float model. Common causes:
- Overly diverse representative dataset widens INT8 ranges.
- Using
dbmagnitude scaling (poor quantization behavior). - Very wide channel counts without proper alignment.
Arguments¶
| Argument | Default | Description |
|---|---|---|
--checkpoint_path |
(required) | Path to trained .keras model |
--model_config |
(inferred) | Path to _model_config.json |
--output_path |
(inferred) | Output .tflite path |
--data_path_train |
None | Training data for representative dataset |
--num_samples |
1024 | Number of representative samples |
--validate_samples |
256 | Samples for Keras vs. TFLite validation |
--min_cosine_sim |
0.95 | Fail conversion if cosine similarity is below this |
--quantization |
ptq |
ptq (full INT8 with calibration) or dynamic (dynamic range, no calibration data) |
--per_tensor |
off | Use per-tensor quantization instead of per-channel |
--batch_validate |
0 | Run validation N times with different seeds, report worst-case |
--export_onnx |
off | Also export ONNX model (requires tf2onnx) |
--report_json |
None | Save structured JSON conversion report |
Quantization details¶
- Scheme: full integer quantization (INT8 weights + INT8 activations)
- I/O: float32 — audio inputs are continuous-valued and lose meaningful precision at INT8
- Calibration: representative dataset drawn from training data with stratified class sampling and SNR filtering (near-silent chunks skipped)
- Target hardware: STM32N6 NPU (requires channel counts in multiples of 8)
- Per-channel (default): quantizes each output channel separately — better accuracy
- Per-tensor: single scale per tensor — use only if per-channel causes N6 issues
- Dynamic range: INT8 weights, runtime float activations — no calibration data needed, less compression
Quantization modes
Use --quantization ptq (default) for best on-device performance.
Use --quantization dynamic when no training data is available.
Use --per_tensor only if stedgeai rejects a per-channel model.
No INT8 I/O
Audio spectrograms are continuous-valued signals. Quantizing model inputs to INT8 would destroy meaningful precision. The pipeline enforces float32 I/O with INT8 internals only.