Hardware¶
STM32N6570-DK Board¶
The STM32N6570-DK is ST's first discovery kit with a hardware Neural Processing Unit (NPU).
| Feature | Detail |
|---|---|
| MCU | STM32N657X0H3QU — Arm Cortex-M55 @ 800 MHz |
| NPU | ST Neural-ART accelerator, 1.2 TOPS (INT8) |
| Internal SRAM | 4.2 MB total (cpuRAM1/2/3, npuRAM1–6, flexRAM) |
| External RAM | 256 Mbit octal HyperRAM (XSPI port 1) |
| External Flash | 1 Gbit octal NOR (XSPI port 2) — model weights go here |
| SD card | microSD via SDMMC2, 4-bit bus, up to 208 MHz |
| Debug | ST-LINK V3 (SWD + VCP UART on USB) |
| UART | USART1 via ST-LINK VCP at 921,600 baud |
Cortex-M55 CPU¶
The M55 is an Armv8.1-M core with:
- Helium (MVE) — SIMD extensions for DSP. Not used in our plain-C FFT, but available for future optimization (could roughly halve STFT time).
- Dual DCache / ICache — coherency with the NPU requires explicit cache
management via
SCB_CleanDCache_by_Addr()andSCB_InvalidateDCache_by_Addr(). This is the single most common source of bugs on this platform — see Troubleshooting. - TrustZone — the N6 boots in secure mode. The NPU_Validation project handles the secure-to-nonsecure transition. Our firmware runs in privileged secure mode throughout.
Neural-ART NPU¶
The NPU is a hardware accelerator purpose-built for INT8 convolutional neural networks:
- Supported operators: Conv2D, DepthwiseConv2D, Dense, Pool (avg/max), Add,
ReLU/ReLU6, Softmax, Reshape, and more. Run
stedgeai analyzeon your model to check operator compatibility. - Memory access: operates on its own SRAM banks (npuRAM1–6) with DMA-like data movement. The CPU communicates with the NPU via the LL_ATON runtime API.
- Weights: stored in external NOR flash, memory-mapped via XSPI, and streamed to the NPU during inference. This means the model size is limited by the NOR capacity (128 MB), not internal SRAM.
- Activations: live in npuRAM (internal SRAM), not external memory. The
activation scratch size depends on the model topology and is reported by
stedgeai analyze.
NPU compatibility is the absolute priority
Every model, layer, and quantization decision must be verified against the
STM32N6 NPU operator set. Always run stedgeai analyze before committing
model changes. See the troubleshooting page for
common pitfalls.
Memory Map¶
Address Range Region Typical Usage
─────────────────────────────────────────────────────────────────
0x2400_0000 .. 0x2440_0000 cpuRAM1 (256 KB) Stack, small globals
0x2440_0000 .. 0x2480_0000 cpuRAM2 (256 KB) audio_buf, spec_buf
0x2480_0000 .. 0x24C0_0000 cpuRAM3 (256 KB) Heap, FatFs work area
0x3400_0000 .. 0x3460_0000 npuRAM1–6 (~1.5 MB) NPU I/O + activations
0x7000_0000 .. 0x7200_0000 HyperRAM (32 MB) External RAM (memory-mapped)
0x7200_0000 .. 0x7A00_0000 NOR flash (128 MB) NPU weights (memory-mapped)
The linker script (STM32N657xx.ld) places .text and .rodata in internal
SRAM, and model weights are flashed to external NOR by the n6_loader tool at
deployment time. The NOR memory-mapping is configured during the board init
sequence (see Building & Flashing — Init Sequence).
Application Buffers¶
| Buffer | Size | Alignment | Notes |
|---|---|---|---|
audio_buf |
CHUNK_SAMPLES × 4 bytes |
32 B (DCache line) | e.g. 72,000 × 4 = 288 KB @ 24 kHz × 3 s |
spec_buf |
FFT_BINS × SPEC_WIDTH × 4 bytes |
32 B | e.g. 257 × 256 × 4 = 263 KB |
file_list |
SD_MAX_FILES × SD_MAX_PATH bytes |
.bss | 512 × 256 = 128 KB |
scores |
NUM_CLASSES × 4 bytes |
stack | Tiny (40 B for 10 classes) |
| Total | ~680 KB | Fits comfortably in the 4.2 MB internal SRAM |
NPU Memory¶
| Region | Typical Size | Location |
|---|---|---|
| NPU input | ~263 KB | npuRAM (auto-placed by LL_ATON) |
| NPU output | 40 bytes | npuRAM |
| NPU activations | ~320 KB | npuRAM |
| NPU weights | ~200–300 KB | External NOR flash (read-only) |
The exact sizes depend on the model and are reported by stedgeai analyze: