Skip to content

Model Conversion

Convert a trained Keras model to a quantized TFLite model using post-training quantization (PTQ) with INT8 internals and float32 I/O.

Basic usage

python -m birdnet_stm32 convert \
  --checkpoint_path checkpoints/my_model.keras \
  --model_config checkpoints/my_model_model_config.json \
  --data_path_train data/train

This produces:

  • my_model_quantized.tflite — quantized TFLite model
  • my_model_quantized_validation_data.npz — validation inputs/outputs for on-device comparison

How it works

flowchart TD
    A[".keras model"] --> B["Load model\n+ model_config.json"]
    B --> C["Build representative\ndataset (1024 samples)"]
    C --> D["TFLite PTQ\nfloat32 I/O, INT8 internals"]
    D --> E["Validate: Keras vs. TFLite\ncosine sim, MSE, Pearson r"]
    E --> F[".tflite + .npz\nfor on-device validation"]

Validation metrics

After conversion, the script reports:

Metric Target Description
Cosine similarity > 0.95 Directional agreement of output vectors
MSE Low Mean squared error
MAE Low Mean absolute error
Pearson r > 0.95 Linear correlation

Cosine similarity < 0.95

If cosine similarity drops below 0.95, the quantized model may behave significantly differently from the float model. Common causes:

  • Overly diverse representative dataset widens INT8 ranges.
  • Using db magnitude scaling (poor quantization behavior).
  • Very wide channel counts without proper alignment.

Arguments

Argument Default Description
--checkpoint_path (required) Path to trained .keras model
--model_config (inferred) Path to _model_config.json
--output_path (inferred) Output .tflite path
--data_path_train None Training data for representative dataset
--num_samples 1024 Number of representative samples
--validate_samples 256 Samples for Keras vs. TFLite validation
--min_cosine_sim 0.95 Fail conversion if cosine similarity is below this
--quantization ptq ptq (full INT8 with calibration) or dynamic (dynamic range, no calibration data)
--per_tensor off Use per-tensor quantization instead of per-channel
--batch_validate 0 Run validation N times with different seeds, report worst-case
--export_onnx off Also export ONNX model (requires tf2onnx)
--report_json None Save structured JSON conversion report

Quantization details

  • Scheme: full integer quantization (INT8 weights + INT8 activations)
  • I/O: float32 — audio inputs are continuous-valued and lose meaningful precision at INT8
  • Calibration: representative dataset drawn from training data with stratified class sampling and SNR filtering (near-silent chunks skipped)
  • Target hardware: STM32N6 NPU (requires channel counts in multiples of 8)
  • Per-channel (default): quantizes each output channel separately — better accuracy
  • Per-tensor: single scale per tensor — use only if per-channel causes N6 issues
  • Dynamic range: INT8 weights, runtime float activations — no calibration data needed, less compression

Quantization modes

Use --quantization ptq (default) for best on-device performance. Use --quantization dynamic when no training data is available. Use --per_tensor only if stedgeai rejects a per-channel model.

No INT8 I/O

Audio spectrograms are continuous-valued signals. Quantizing model inputs to INT8 would destroy meaningful precision. The pipeline enforces float32 I/O with INT8 internals only.