blocks
birdnet_stm32.models.blocks
¶
Additional model building blocks for audio classification.
Provides N6 NPU-compatible building blocks: - Squeeze-and-Excite (SE) channel attention - MobileNetV2-style inverted residual blocks - Lightweight attention pooling
AttentionPooling
¶
Bases: Layer
Lightweight attention pooling over spatial dimensions.
Replaces GlobalAveragePooling2D with a learned weighted average. Uses only Dense + Softmax + Multiply + ReduceSum — all NPU-compatible.
Source code in birdnet_stm32/models/blocks.py
se_block(x, reduction=4, name='se')
¶
Squeeze-and-Excite channel attention block.
NPU-compatible: uses GlobalAveragePooling2D, Dense, Sigmoid, Multiply.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input tensor [B, H, W, C]. |
required |
reduction
|
int
|
Channel reduction factor for the bottleneck. |
4
|
name
|
str
|
Base name for layers. |
'se'
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Channel-reweighted tensor, same shape as input. |
Source code in birdnet_stm32/models/blocks.py
inverted_residual_block(x, out_ch, expansion=6, stride_f=1, stride_t=1, use_se=False, se_reduction=4, weight_decay=0.0001, drop_rate=0.1, name='ir')
¶
MobileNetV2-style inverted residual block with optional SE attention.
Structure: 1x1 expand -> BN -> ReLU6 -> 3x3 DW -> BN -> ReLU6 -> [SE] -> 1x1 project -> BN Residual connection when stride=1 and channels match.
All ops are NPU-compatible (Conv2D, DepthwiseConv2D, Dense, Sigmoid, Multiply, Add).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input tensor [B, H, W, C]. |
required |
out_ch
|
int
|
Output channels. |
required |
expansion
|
int
|
Expansion factor for the hidden dimension. |
6
|
stride_f
|
int
|
Stride along frequency axis. |
1
|
stride_t
|
int
|
Stride along time axis. |
1
|
use_se
|
bool
|
Whether to apply squeeze-and-excite attention. |
False
|
se_reduction
|
int
|
SE channel reduction factor. |
4
|
weight_decay
|
float
|
L2 regularization weight. |
0.0001
|
drop_rate
|
float
|
Spatial dropout rate. |
0.1
|
name
|
str
|
Base name for layers. |
'ir'
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Output tensor [B, H', W', out_ch]. |
Source code in birdnet_stm32/models/blocks.py
attention_pooling(x, name='attn_pool')
¶
Lightweight attention pooling over spatial dimensions.
Replaces GlobalAveragePooling2D with a learned weighted average. Uses only Dense + Softmax + Multiply + ReduceSum — all NPU-compatible.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input tensor [B, H, W, C]. |
required |
name
|
str
|
Base name for layers. |
'attn_pool'
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Pooled tensor [B, C]. |