Deployment¶
Deploy a quantized TFLite model to the STM32N6570-DK development board using ST's X-CUBE-AI toolchain.
Prerequisites¶
| Tool | Version | Download |
|---|---|---|
| X-CUBE-AI | 10.2.0+ | ST website |
| STM32CubeProgrammer | 2.20+ | ST website |
| STM32CubeIDE | 1.19+ | ST website |
| ARM GNU Toolchain | 14.3+ | ARM Developer |
Overview¶
The deployment pipeline has three stages:
flowchart LR
A[".tflite\nquantized model"] --> B["stedgeai generate\nN6-optimized binary"]
B --> C["n6_loader.py\nserial flash to board"]
C --> D["stedgeai validate\non-device inference"]
D --> E["Validation report\ncosine sim + latency"]
(Image source: STM32ai)
Step 1: Install X-CUBE-AI¶
unzip x-cube-ai-linux-v10.2.0.zip X-CUBE-AI.10.2.0
cd X-CUBE-AI.10.2.0
unzip stedgeai-linux-10.2.0.zip
Directory structure after extraction:
X-CUBE-AI.10.2.0/
├── Utilities/
│ └── linux/
│ └── stedgeai # CLI tool
├── Middlewares/
└── Projects/
Step 2: Install ARM GNU Toolchain¶
wget https://developer.arm.com/-/media/Files/downloads/gnu/14.3.rel1/binrel/arm-gnu-toolchain-14.3.rel1-x86_64-arm-none-eabi.tar.xz
tar xf arm-gnu-toolchain-14.3.rel1-x86_64-arm-none-eabi.tar.xz
export PATH=$PWD/arm-gnu-toolchain-14.3.rel1-x86_64-arm-none-eabi/bin:$PATH
Verify:
Step 3: Install STM32CubeProgrammer¶
Download and run the installer:
Add to PATH and configure permissions:
export PATH=$PATH:/path/to/STM32Cube/STM32CubeProgrammer/bin
sudo usermod -aG plugdev $USER
sudo usermod -aG dialout $USER
# Install udev rules
sudo cp /path/to/STM32CubeProgrammer/Drivers/rules/*.* /etc/udev/rules.d/
sudo udevadm control --reload-rules && sudo udevadm trigger
Unplug and replug the board (or reboot) to apply the new rules.
Verify:
Step 4: Generate model files¶
Navigate to the X-CUBE-AI utilities directory and run:
./stedgeai generate \
--model /path/to/checkpoints/my_model_quantized.tflite \
--target stm32n6 \
--st-neural-art \
--output /path/to/birdnet-stm32/validation/st_ai_output \
--workspace /path/to/birdnet-stm32/validation/st_ai_ws \
--verbose
Analyze first
Run stedgeai analyze instead of generate to get detailed model metrics
(size, memory, per-layer info) without generating output files. Always
analyze new model architectures to verify N6 NPU operator compatibility.
The output includes network_generate_report.txt with model size and compute
requirements.
Step 5: Configure and flash the board¶
Set board to DEV mode¶
- Disconnect the board from USB.
- Set BOOT0 to right.
- Set BOOT1 to left.
- Set JP2 to position 1-2.
- Reconnect the board.
(Image source: ST Community)
Create configuration files¶
Copy the example config and fill in your local paths:
Edit config.json with your machine-local paths:
{
"compiler_type": "gcc",
"cubeide_path": "/path/to/stm32cubeide",
"x_cube_ai_path": "/path/to/X-CUBE-AI.10.2.0",
"model_path": "checkpoints/best_model_quantized.tflite",
"output_dir": "validation/st_ai_output",
"workspace_dir": "validation/st_ai_ws",
"n6_loader_config": "config_n6l.json"
}
Create config_n6l.json in the project root (required by ST's n6_loader):
{
"network.c": "/path/to/birdnet-stm32/validation/st_ai_output/network.c",
"project_path": "/path/to/X-CUBE-AI.10.2.0/Projects/STM32N6570-DK/Applications/NPU_Validation",
"project_build_conf": "N6-DK",
"skip_external_flash_programming": false,
"skip_ram_data_programming": false,
"objcopy_binary_path": "/usr/bin/arm-none-eabi-objcopy"
}
Warning
Both config files contain machine-local paths. They are listed in
.gitignore — do not commit them. Use config.example.json as a
reference template.
Run the full deploy pipeline¶
The CLI reads all paths from config.json and runs generate → flash → validate:
You can override any path via CLI arguments:
Or via environment variables:
Priority order: CLI arguments > environment variables > config.json values.
Verify the board is connected:
You may need serial port permissions:
Step 6: Validate on-device¶
The deploy command runs validation automatically. To run validation separately
with additional options (e.g., --valinput for specific test data):
/path/to/X-CUBE-AI.10.2.0/Utilities/linux/stedgeai validate \
--model checkpoints/my_model_quantized.tflite \
--target stm32n6 \
--mode target \
--desc serial:921600 \
--output /path/to/birdnet-stm32/validation/st_ai_output \
--workspace /path/to/birdnet-stm32/validation/st_ai_ws \
--valinput /path/to/checkpoints/my_model_quantized_validation_data.npz \
--classifier \
--verbose
The validation runs inference on the physical board and compares results to the
reference model. Results are saved to network_validate_report.txt in the
output directory.
Demo application¶
The demo application is under development. The planned pipeline:
- Record audio using the on-board microphone.
- Run FFT on 512-sample frames, accumulating into a ring buffer.
- Run inference every second on the last 3 seconds of audio.
- Map prediction scores to labels using
labels.txt. - Log top-5 predictions to the serial console.
Board test¶
The board-test command runs a standalone inference test on the STM32N6570-DK.
The firmware reads WAV files from the SD card, computes the STFT on the
Cortex-M55, runs the model on the NPU, and streams results over UART. This
verifies the entire on-device pipeline end-to-end.
SD card preparation¶
- Format a micro-SD card as FAT32.
- Create an
audio/directory at the root. - Copy mono or stereo 16-bit PCM
.wavfiles intoaudio/. Each file should be at least as long as the model's chunk duration (default 3 s). The sample rate must match the model's (printed in_model_config.json). - Insert the card into the STM32N6570-DK slot.
Board-test arguments¶
| Argument | Default | Description |
|---|---|---|
--model_path |
(from config) | Path to quantized .tflite model |
--model_config |
(inferred) | Path to _model_config.json |
--labels |
(inferred) | Path to _labels.txt |
--serial_port |
/dev/ttyACM0 |
Serial port for UART capture |
--top_k |
5 | Top-K predictions per file |
--score_threshold |
0.01 | Minimum score to display |
--config |
config.json |
Deploy configuration JSON |
--timeout |
300 | Max seconds to wait for firmware response |
--save_results |
None | Save results summary to a CSV file |
Board test is standalone
The board-test command deploys real firmware that does all processing on the board: read WAV from SD card → compute STFT on Cortex-M55 → run NPU inference → write results to SD card + serial. Do NOT precompute spectrograms on the host — that defeats the purpose of an integration test.
Firmware documentation¶
For detailed documentation on the board firmware — hardware specs, build system, configuration, source module reference, UART protocol, and troubleshooting — see the Firmware section.