Hardware¶

STM32N6570-DK Board¶

The STM32N6570-DK is ST's first discovery kit with a hardware Neural Processing Unit (NPU).

Feature	Detail
MCU	STM32N657X0H3QU — Arm Cortex-M55 @ 800 MHz
NPU	ST Neural-ART accelerator, 1.2 TOPS (INT8)
Internal SRAM	4.2 MB total (cpuRAM1/2/3, npuRAM1–6, flexRAM)
External RAM	256 Mbit octal HyperRAM (XSPI port 1)
External Flash	1 Gbit octal NOR (XSPI port 2) — model weights go here
SD card	microSD via SDMMC2, 4-bit bus, up to 208 MHz
Debug	ST-LINK V3 (SWD + VCP UART on USB)
UART	USART1 via ST-LINK VCP at 921,600 baud

Cortex-M55 CPU¶

The M55 is an Armv8.1-M core with:

Helium (MVE) — SIMD extensions for DSP. Not used in our plain-C FFT, but available for future optimization (could roughly halve STFT time).
Dual DCache / ICache — coherency with the NPU requires explicit cache management via SCB_CleanDCache_by_Addr() and SCB_InvalidateDCache_by_Addr(). This is the single most common source of bugs on this platform — see Troubleshooting.
TrustZone — the N6 boots in secure mode. The NPU_Validation project handles the secure-to-nonsecure transition. Our firmware runs in privileged secure mode throughout.

Neural-ART NPU¶

The NPU is a hardware accelerator purpose-built for INT8 convolutional neural networks:

Supported operators: Conv2D, DepthwiseConv2D, Dense, Pool (avg/max), Add, ReLU/ReLU6, Softmax, Reshape, and more. Run stedgeai analyze on your model to check operator compatibility.
Memory access: operates on its own SRAM banks (npuRAM1–6) with DMA-like data movement. The CPU communicates with the NPU via the LL_ATON runtime API.
Weights: stored in external NOR flash, memory-mapped via XSPI, and streamed to the NPU during inference. This means the model size is limited by the NOR capacity (128 MB), not internal SRAM.
Activations: live in npuRAM (internal SRAM), not external memory. The activation scratch size depends on the model topology and is reported by stedgeai analyze.

NPU compatibility is the absolute priority

Every model, layer, and quantization decision must be verified against the STM32N6 NPU operator set. Always run stedgeai analyze before committing model changes. See the troubleshooting page for common pitfalls.

Memory Map¶

Address Range                 Region              Typical Usage
─────────────────────────────────────────────────────────────────
0x2400_0000 .. 0x2440_0000    cpuRAM1 (256 KB)    Stack, small globals
0x2440_0000 .. 0x2480_0000    cpuRAM2 (256 KB)    audio_buf, spec_buf
0x2480_0000 .. 0x24C0_0000    cpuRAM3 (256 KB)    Heap, FatFs work area
0x3400_0000 .. 0x3460_0000    npuRAM1–6 (~1.5 MB) NPU I/O + activations
0x7000_0000 .. 0x7200_0000    HyperRAM (32 MB)    External RAM (memory-mapped)
0x7200_0000 .. 0x7A00_0000    NOR flash (128 MB)  NPU weights (memory-mapped)

The linker script (STM32N657xx.ld) places .text and .rodata in internal SRAM, and model weights are flashed to external NOR by the n6_loader tool at deployment time. The NOR memory-mapping is configured during the board init sequence (see Building & Flashing — Init Sequence).

Application Buffers¶

Buffer	Size	Alignment	Notes
`audio_buf`	`CHUNK_SAMPLES × 4` bytes	32 B (DCache line)	e.g. 72,000 × 4 = 288 KB @ 24 kHz × 3 s
`spec_buf`	`FFT_BINS × SPEC_WIDTH × 4` bytes	32 B	e.g. 257 × 256 × 4 = 263 KB
`file_list`	`SD_MAX_FILES × SD_MAX_PATH` bytes	.bss	512 × 256 = 128 KB
`scores`	`NUM_CLASSES × 4` bytes	stack	Tiny (40 B for 10 classes)
Total	~680 KB		Fits comfortably in the 4.2 MB internal SRAM

NPU Memory¶

Region	Typical Size	Location
NPU input	~263 KB	npuRAM (auto-placed by LL_ATON)
NPU output	40 bytes	npuRAM
NPU activations	~320 KB	npuRAM
NPU weights	~200–300 KB	External NOR flash (read-only)

The exact sizes depend on the model and are reported by stedgeai analyze:

stedgeai analyze --model checkpoints/best_model_quantized.tflite --target stm32n6