Skip to content

model.model

Neural network architecture for species occurrence prediction.

Multi-task neural network model for species occurrence prediction.

Architecture uses residual blocks with pre-norm (LayerNorm → GELU) for stable training and strong gradient flow. The shared encoder maps raw (lat, lon, week) inputs through circular encoding into a rich embedding that feeds two task heads: - Species prediction (multi-label classification) - Environmental reconstruction (auxiliary regression)

Input encoding is handled inside the model so that inference only requires raw latitude, longitude, and week number — no external preprocessing needed.

Classes

CircularEncoding

Bases: Module

Multi-harmonic circular encoding for periodic/angular values.

Given a scalar angle θ (in radians), produces: [sin(θ), cos(θ), sin(2θ), cos(2θ), …, sin(nθ), cos(nθ)]

Output dimension = 2 * n_harmonics per input scalar.

Reference: Tancik et al., "Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains" (NeurIPS 2020).

Functions
__init__(n_harmonics=1)

Initialize circular encoding.

Parameters:

Name Type Description Default
n_harmonics int

Number of harmonic frequencies to use.

1
forward(angles)

Parameters:

Name Type Description Default
angles Tensor

(batch,) or (batch, 1) — angles in radians

required

Returns:

Type Description
Tensor

(batch, 2 * n_harmonics) — [sin(θ), cos(θ), sin(2θ), cos(2θ), …]

ResidualBlock

Bases: Module

Pre-norm residual block: LayerNorm → GELU → Linear → LayerNorm → GELU → Dropout → Linear.

Functions
__init__(dim, dropout=0.1)

Initialize a residual block.

Parameters:

Name Type Description Default
dim int

Hidden dimension (input and output are both dim).

required
dropout float

Dropout probability.

0.1
forward(x)

Apply residual connection: x + block(x).

SpatioTemporalEncoder

Bases: Module

Shared encoder that maps raw (lat, lon, week) to an embedding via multi-harmonic circular encoding and FiLM temporal conditioning.

Spatial features (lat, lon) are projected into embed_dim and processed by residual blocks. Temporal features (week) are encoded separately and used to generate per-block FiLM (Feature-wise Linear Modulation) scale and shift parameters that modulate the spatial representation. This forces the network to actively use temporal information rather than relying on a weak concatenated signal.

Reference: Perez et al., "FiLM: Visual Reasoning with a General Conditioning Layer" (AAAI 2018).

Inputs (all per-sample): lat : float in [-90, 90] lon : float in [-180, 180] week : int in {1, …, 48}

Internal encoding produces

lat → CircularEncoding(lat_rad) → 2 * coord_harmonics features lon → CircularEncoding(lon_rad) → 2 * coord_harmonics features week → CircularEncoding(week_rad) → 2 * week_harmonics features → FiLM generators produce (γ, β) per residual block

Functions
__init__(coord_harmonics=4, week_harmonics=4, embed_dim=512, n_blocks=4, dropout=0.1, n_weeks=48)

Initialize the spatio-temporal encoder.

Parameters:

Name Type Description Default
coord_harmonics int

Harmonics for lat/lon circular encoding.

4
week_harmonics int

Harmonics for week circular encoding.

4
embed_dim int

Output embedding dimension.

512
n_blocks int

Number of residual blocks.

4
dropout float

Dropout probability.

0.1
n_weeks int

Total weeks in the annual cycle.

48
forward(lat, lon, week)

Parameters:

Name Type Description Default
lat Tensor

(batch,) raw latitude in degrees [-90, 90]

required
lon Tensor

(batch,) raw longitude in degrees [-180, 180]

required
week Tensor

(batch,) week number (1–48)

required

SpeciesPredictionHead

Bases: Module

Multi-label classification head with residual blocks and low-rank output.

The final projection uses a low-rank factorization

hidden_dim → bottleneck → n_species

instead of a single Linear(hidden_dim, n_species). This reduces parameters dramatically when n_species is large (10K+) while learning a compact species-embedding space whose dimensions can be interpreted as latent ecological niches.

Functions
__init__(input_dim, n_species, hidden_dim=512, n_blocks=2, dropout=0.2, bottleneck=128)

Initialize the species prediction head.

Parameters:

Name Type Description Default
input_dim int

Dimension of the encoder output.

required
n_species int

Number of target species (output logits).

required
hidden_dim int

Hidden dimension of residual blocks.

512
n_blocks int

Number of residual blocks.

2
dropout float

Dropout probability.

0.2
bottleneck int

Low-rank bottleneck dimension before the output layer.

128
forward(x)

Project encoder output to species logits.

Parameters:

Name Type Description Default
x Tensor

Encoder output of shape (batch, input_dim).

required

Returns:

Type Description
Tensor

Logits of shape (batch, n_species).

EnvironmentalPredictionHead

Bases: Module

Regression head for environmental feature reconstruction (auxiliary task).

Functions
__init__(input_dim, n_env_features, hidden_dim=256, n_blocks=1, dropout=0.1)

Initialize the environmental prediction head.

Parameters:

Name Type Description Default
input_dim int

Dimension of the encoder output.

required
n_env_features int

Number of environmental features to predict.

required
hidden_dim int

Hidden dimension of residual blocks.

256
n_blocks int

Number of residual blocks.

1
dropout float

Dropout probability.

0.1
forward(x)

Predict environmental features from encoder output.

Parameters:

Name Type Description Default
x Tensor

Encoder output of shape (batch, input_dim).

required

Returns:

Type Description
Tensor

Predicted features of shape (batch, n_env_features).

HabitatSpeciesHead

Bases: Module

Predict species from predicted environmental features.

Creates an explicit pathway from environmental conditions to species occurrence, making the habitat→species relationship directly learnable rather than implicit in the shared encoder. Combined with the direct :class:SpeciesPredictionHead via a learned per-species gate, the model can leverage both spatial-embedding patterns and environmental feature associations.

Architecture mirrors :class:SpeciesPredictionHead (residual blocks + low-rank bottleneck) but takes predicted environmental features as input instead of the encoder embedding.

During training, gradients flow back through the environmental head, reinforcing it to produce representations that are useful for both regression accuracy and species prediction.

Functions
__init__(n_env_features, n_species, hidden_dim=256, n_blocks=1, dropout=0.1, bottleneck=128)

Initialize the habitat-species head.

Parameters:

Name Type Description Default
n_env_features int

Number of input environmental features (output dim of the environmental head).

required
n_species int

Number of target species (output logits).

required
hidden_dim int

Hidden dimension of residual blocks.

256
n_blocks int

Number of residual blocks.

1
dropout float

Dropout probability.

0.1
bottleneck int

Low-rank bottleneck dimension before the output layer.

128
forward(env_pred)

Predict species logits from environmental features.

Parameters:

Name Type Description Default
env_pred Tensor

Predicted environmental features of shape (batch, n_env_features).

required

Returns:

Type Description
Tensor

Logits of shape (batch, n_species).

BirdNETGeoModel

Bases: Module

Multi-task model for bird species occurrence prediction.

Accepts raw (lat, lon, week) inputs — encoding is handled internally.

Training: (lat, lon, week) → encoder → species logits + env predictions Inference: (lat, lon, week) → encoder → species logits only

Functions
__init__(n_species, n_env_features, coord_harmonics=4, week_harmonics=4, embed_dim=512, encoder_blocks=4, species_head_dim=512, species_head_blocks=2, species_bottleneck=128, env_head_dim=256, env_head_blocks=1, dropout=0.1, species_dropout=0.2, env_dropout=0.1, habitat_head=False, habitat_head_dim=256, habitat_head_blocks=1, habitat_bottleneck=128)

Initialize the full multi-task model.

Parameters:

Name Type Description Default
n_species int

Number of target species.

required
n_env_features int

Number of environmental features (auxiliary task).

required
coord_harmonics int

Harmonics for lat/lon encoding.

4
week_harmonics int

Harmonics for week encoding.

4
embed_dim int

Encoder embedding dimension.

512
encoder_blocks int

Number of residual blocks in the encoder.

4
species_head_dim int

Hidden dim for species head.

512
species_head_blocks int

Residual blocks in species head.

2
species_bottleneck int

Low-rank bottleneck size.

128
env_head_dim int

Hidden dim for environmental head.

256
env_head_blocks int

Residual blocks in environmental head.

1
dropout float

Encoder dropout.

0.1
species_dropout float

Species head dropout.

0.2
env_dropout float

Environmental head dropout.

0.1
habitat_head bool

Enable habitat-species association head. When True, predicted environmental features are fed through a secondary species head whose logits are combined with the direct species head via a learned per-species gate.

False
habitat_head_dim int

Hidden dim for habitat-species head.

256
habitat_head_blocks int

Residual blocks in habitat-species head.

1
habitat_bottleneck int

Low-rank bottleneck for habitat-species head.

128
forward(lat, lon, week, return_env=True)

Run the full model forward pass.

When the habitat-species head is enabled, the environmental head always runs (even during inference) and its output feeds into the habitat head. The final species logits are a learned gate-weighted combination of the direct and habitat predictions.

Parameters:

Name Type Description Default
lat Tensor

Raw latitude in degrees, shape (batch,).

required
lon Tensor

Raw longitude in degrees, shape (batch,).

required
week Tensor

Week number (0–48), shape (batch,).

required
return_env bool

If True, also return environmental predictions.

True

Returns:

Type Description
Dict[str, Tensor]

Dict with 'species_logits' and optionally 'env_pred'.

predict_species(lat, lon, week, threshold=0.5)

Return binary species predictions at the given threshold.

Parameters:

Name Type Description Default
lat Tensor

Latitude tensor.

required
lon Tensor

Longitude tensor.

required
week Tensor

Week tensor.

required
threshold float

Probability threshold for a positive prediction.

0.5

Returns:

Type Description
Tensor

Binary tensor of shape (batch, n_species).

get_species_probabilities(lat, lon, week)

Return sigmoid probabilities for all species.

Parameters:

Name Type Description Default
lat Tensor

Latitude tensor.

required
lon Tensor

Longitude tensor.

required
week Tensor

Week tensor.

required

Returns:

Type Description
Tensor

Probability tensor of shape (batch, n_species).

Functions

create_model(n_species, n_env_features, model_scale=1.0, coord_harmonics=4, week_harmonics=8, habitat_head=False)

Create model with a continuous size scaling factor.

model_scale=1.0 matches the former medium preset (embed_dim=512, 4 encoder blocks, ~7 M params with 12 K species). Dimensions scale linearly; block counts are rounded to the nearest integer (minimum 1).

Rough parameter-count landmarks (with 12 K species):

  • 0.5 → ~1.5 M (≈ former small)
  • 1.0 → ~7.2 M (≈ former medium)
  • 2.0 → ~47 M (≈ former large)

Parameters:

Name Type Description Default
n_species int

Number of target species.

required
n_env_features int

Number of environmental features.

required
model_scale float

Continuous scaling factor (default 1.0).

1.0
coord_harmonics int

Harmonics for lat/lon encoding.

4
week_harmonics int

Harmonics for week encoding.

8
habitat_head bool

Enable habitat-species association head.

False