model.model¶
Neural network architecture for species occurrence prediction.
Multi-task neural network model for species occurrence prediction.
Architecture uses residual blocks with pre-norm (LayerNorm → GELU) for stable training and strong gradient flow. The shared encoder maps raw (lat, lon, week) inputs through circular encoding into a rich embedding that feeds two task heads: - Species prediction (multi-label classification) - Environmental reconstruction (auxiliary regression)
Input encoding is handled inside the model so that inference only requires raw latitude, longitude, and week number — no external preprocessing needed.
Classes¶
CircularEncoding
¶
Bases: Module
Multi-harmonic circular encoding for periodic/angular values.
Given a scalar angle θ (in radians), produces: [sin(θ), cos(θ), sin(2θ), cos(2θ), …, sin(nθ), cos(nθ)]
Output dimension = 2 * n_harmonics per input scalar.
Reference: Tancik et al., "Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains" (NeurIPS 2020).
Functions¶
__init__(n_harmonics=1)
¶
Initialize circular encoding.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_harmonics
|
int
|
Number of harmonic frequencies to use. |
1
|
forward(angles)
¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
angles
|
Tensor
|
(batch,) or (batch, 1) — angles in radians |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
(batch, 2 * n_harmonics) — [sin(θ), cos(θ), sin(2θ), cos(2θ), …] |
ResidualBlock
¶
Bases: Module
Pre-norm residual block: LayerNorm → GELU → Linear → LayerNorm → GELU → Dropout → Linear.
SpatioTemporalEncoder
¶
Bases: Module
Shared encoder that maps raw (lat, lon, week) to an embedding via multi-harmonic circular encoding and FiLM temporal conditioning.
Spatial features (lat, lon) are projected into embed_dim and processed by residual blocks. Temporal features (week) are encoded separately and used to generate per-block FiLM (Feature-wise Linear Modulation) scale and shift parameters that modulate the spatial representation. This forces the network to actively use temporal information rather than relying on a weak concatenated signal.
Reference: Perez et al., "FiLM: Visual Reasoning with a General Conditioning Layer" (AAAI 2018).
Inputs (all per-sample): lat : float in [-90, 90] lon : float in [-180, 180] week : int in {1, …, 48}
Internal encoding produces
lat → CircularEncoding(lat_rad) → 2 * coord_harmonics features lon → CircularEncoding(lon_rad) → 2 * coord_harmonics features week → CircularEncoding(week_rad) → 2 * week_harmonics features → FiLM generators produce (γ, β) per residual block
Functions¶
__init__(coord_harmonics=4, week_harmonics=4, embed_dim=512, n_blocks=4, dropout=0.1, n_weeks=48)
¶
Initialize the spatio-temporal encoder.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
coord_harmonics
|
int
|
Harmonics for lat/lon circular encoding. |
4
|
week_harmonics
|
int
|
Harmonics for week circular encoding. |
4
|
embed_dim
|
int
|
Output embedding dimension. |
512
|
n_blocks
|
int
|
Number of residual blocks. |
4
|
dropout
|
float
|
Dropout probability. |
0.1
|
n_weeks
|
int
|
Total weeks in the annual cycle. |
48
|
forward(lat, lon, week)
¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
lat
|
Tensor
|
(batch,) raw latitude in degrees [-90, 90] |
required |
lon
|
Tensor
|
(batch,) raw longitude in degrees [-180, 180] |
required |
week
|
Tensor
|
(batch,) week number (1–48) |
required |
SpeciesPredictionHead
¶
Bases: Module
Multi-label classification head with residual blocks and low-rank output.
The final projection uses a low-rank factorization
hidden_dim → bottleneck → n_species
instead of a single Linear(hidden_dim, n_species). This reduces parameters dramatically when n_species is large (10K+) while learning a compact species-embedding space whose dimensions can be interpreted as latent ecological niches.
Functions¶
__init__(input_dim, n_species, hidden_dim=512, n_blocks=2, dropout=0.2, bottleneck=128)
¶
Initialize the species prediction head.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_dim
|
int
|
Dimension of the encoder output. |
required |
n_species
|
int
|
Number of target species (output logits). |
required |
hidden_dim
|
int
|
Hidden dimension of residual blocks. |
512
|
n_blocks
|
int
|
Number of residual blocks. |
2
|
dropout
|
float
|
Dropout probability. |
0.2
|
bottleneck
|
int
|
Low-rank bottleneck dimension before the output layer. |
128
|
forward(x)
¶
Project encoder output to species logits.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Encoder output of shape |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
Logits of shape |
EnvironmentalPredictionHead
¶
Bases: Module
Regression head for environmental feature reconstruction (auxiliary task).
Functions¶
__init__(input_dim, n_env_features, hidden_dim=256, n_blocks=1, dropout=0.1)
¶
Initialize the environmental prediction head.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_dim
|
int
|
Dimension of the encoder output. |
required |
n_env_features
|
int
|
Number of environmental features to predict. |
required |
hidden_dim
|
int
|
Hidden dimension of residual blocks. |
256
|
n_blocks
|
int
|
Number of residual blocks. |
1
|
dropout
|
float
|
Dropout probability. |
0.1
|
forward(x)
¶
Predict environmental features from encoder output.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Encoder output of shape |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
Predicted features of shape |
HabitatSpeciesHead
¶
Bases: Module
Predict species from predicted environmental features.
Creates an explicit pathway from environmental conditions to species
occurrence, making the habitat→species relationship directly learnable
rather than implicit in the shared encoder. Combined with the direct
:class:SpeciesPredictionHead via a learned per-species gate, the
model can leverage both spatial-embedding patterns and environmental
feature associations.
Architecture mirrors :class:SpeciesPredictionHead (residual blocks +
low-rank bottleneck) but takes predicted environmental features as
input instead of the encoder embedding.
During training, gradients flow back through the environmental head, reinforcing it to produce representations that are useful for both regression accuracy and species prediction.
Functions¶
__init__(n_env_features, n_species, hidden_dim=256, n_blocks=1, dropout=0.1, bottleneck=128)
¶
Initialize the habitat-species head.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_env_features
|
int
|
Number of input environmental features (output dim of the environmental head). |
required |
n_species
|
int
|
Number of target species (output logits). |
required |
hidden_dim
|
int
|
Hidden dimension of residual blocks. |
256
|
n_blocks
|
int
|
Number of residual blocks. |
1
|
dropout
|
float
|
Dropout probability. |
0.1
|
bottleneck
|
int
|
Low-rank bottleneck dimension before the output layer. |
128
|
forward(env_pred)
¶
Predict species logits from environmental features.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
env_pred
|
Tensor
|
Predicted environmental features of shape
|
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
Logits of shape |
BirdNETGeoModel
¶
Bases: Module
Multi-task model for bird species occurrence prediction.
Accepts raw (lat, lon, week) inputs — encoding is handled internally.
Training: (lat, lon, week) → encoder → species logits + env predictions Inference: (lat, lon, week) → encoder → species logits only
Functions¶
__init__(n_species, n_env_features, coord_harmonics=4, week_harmonics=4, embed_dim=512, encoder_blocks=4, species_head_dim=512, species_head_blocks=2, species_bottleneck=128, env_head_dim=256, env_head_blocks=1, dropout=0.1, species_dropout=0.2, env_dropout=0.1, habitat_head=False, habitat_head_dim=256, habitat_head_blocks=1, habitat_bottleneck=128)
¶
Initialize the full multi-task model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_species
|
int
|
Number of target species. |
required |
n_env_features
|
int
|
Number of environmental features (auxiliary task). |
required |
coord_harmonics
|
int
|
Harmonics for lat/lon encoding. |
4
|
week_harmonics
|
int
|
Harmonics for week encoding. |
4
|
embed_dim
|
int
|
Encoder embedding dimension. |
512
|
encoder_blocks
|
int
|
Number of residual blocks in the encoder. |
4
|
species_head_dim
|
int
|
Hidden dim for species head. |
512
|
species_head_blocks
|
int
|
Residual blocks in species head. |
2
|
species_bottleneck
|
int
|
Low-rank bottleneck size. |
128
|
env_head_dim
|
int
|
Hidden dim for environmental head. |
256
|
env_head_blocks
|
int
|
Residual blocks in environmental head. |
1
|
dropout
|
float
|
Encoder dropout. |
0.1
|
species_dropout
|
float
|
Species head dropout. |
0.2
|
env_dropout
|
float
|
Environmental head dropout. |
0.1
|
habitat_head
|
bool
|
Enable habitat-species association head. When True, predicted environmental features are fed through a secondary species head whose logits are combined with the direct species head via a learned per-species gate. |
False
|
habitat_head_dim
|
int
|
Hidden dim for habitat-species head. |
256
|
habitat_head_blocks
|
int
|
Residual blocks in habitat-species head. |
1
|
habitat_bottleneck
|
int
|
Low-rank bottleneck for habitat-species head. |
128
|
forward(lat, lon, week, return_env=True)
¶
Run the full model forward pass.
When the habitat-species head is enabled, the environmental head always runs (even during inference) and its output feeds into the habitat head. The final species logits are a learned gate-weighted combination of the direct and habitat predictions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
lat
|
Tensor
|
Raw latitude in degrees, shape |
required |
lon
|
Tensor
|
Raw longitude in degrees, shape |
required |
week
|
Tensor
|
Week number (0–48), shape |
required |
return_env
|
bool
|
If True, also return environmental predictions. |
True
|
Returns:
| Type | Description |
|---|---|
Dict[str, Tensor]
|
Dict with |
predict_species(lat, lon, week, threshold=0.5)
¶
Return binary species predictions at the given threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
lat
|
Tensor
|
Latitude tensor. |
required |
lon
|
Tensor
|
Longitude tensor. |
required |
week
|
Tensor
|
Week tensor. |
required |
threshold
|
float
|
Probability threshold for a positive prediction. |
0.5
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Binary tensor of shape |
get_species_probabilities(lat, lon, week)
¶
Return sigmoid probabilities for all species.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
lat
|
Tensor
|
Latitude tensor. |
required |
lon
|
Tensor
|
Longitude tensor. |
required |
week
|
Tensor
|
Week tensor. |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
Probability tensor of shape |
Functions¶
create_model(n_species, n_env_features, model_scale=1.0, coord_harmonics=4, week_harmonics=8, habitat_head=False)
¶
Create model with a continuous size scaling factor.
model_scale=1.0 matches the former medium preset
(embed_dim=512, 4 encoder blocks, ~7 M params with 12 K species).
Dimensions scale linearly; block counts are rounded to the nearest
integer (minimum 1).
Rough parameter-count landmarks (with 12 K species):
- 0.5 → ~1.5 M (≈ former small)
- 1.0 → ~7.2 M (≈ former medium)
- 2.0 → ~47 M (≈ former large)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_species
|
int
|
Number of target species. |
required |
n_env_features
|
int
|
Number of environmental features. |
required |
model_scale
|
float
|
Continuous scaling factor (default 1.0). |
1.0
|
coord_harmonics
|
int
|
Harmonics for lat/lon encoding. |
4
|
week_harmonics
|
int
|
Harmonics for week encoding. |
8
|
habitat_head
|
bool
|
Enable habitat-species association head. |
False
|