Variable Importance¶
scripts/plot_variable_importance.py generates bar charts showing the Spearman rank correlation between environmental variables and model-predicted species occurrence probability.
What It Shows¶
For each species, the script:
- Loads the training data (H3 cells with environmental features)
- Runs model inference to get per-cell species probabilities
- Computes Spearman rank correlation between each environmental variable and the predicted probability
- Plots a horizontal bar chart with variables grouped by category
This reveals which environmental factors the model associates with each species, providing interpretable insight into the model's learned spatial patterns.
Usage¶
# Specific species by name
python scripts/plot_variable_importance.py \
--data_path outputs/combined.parquet \
--species "Eurasian Blackbird" "Great Tit"
CLI Reference¶
| Flag | Default | Description |
|---|---|---|
--data_path |
required | Combined parquet file with environmental features |
--checkpoint |
checkpoints/checkpoint_best.pt |
Model checkpoint |
--species |
all species | Common or scientific names |
--max_samples |
200000 |
Max data samples (for speed) |
--outdir |
outputs/plots/variable_importance |
Output directory |
--batch_size |
4096 |
Batch size for inference |
Variable Groups¶
Variables are organized into semantic groups with visual separators:
| Group | Variables |
|---|---|
| Location | latitude, longitude |
| Climate | temperature_c, precipitation_mm |
| Terrain | elevation_m |
| Surface | water_fraction, urban_fraction, canopy_height_m |
| Land Cover | 17 one-hot MODIS IGBP classes (Evergreen Needleleaf Forest, Croplands, etc.) |
The landcover_class integer column is automatically one-hot encoded into the 17 IGBP classes for meaningful correlation analysis.
Output¶
One PNG per species (e.g., Eurasian_Blackbird.png):
- Horizontal bar chart with fixed x-axis from -1 to +1
- Blue bars: positive correlation (species more likely in higher values of that variable)
- Red bars: negative correlation
- Variables grouped with separator lines between categories
- All charts use the same axis scale for direct comparability
Interpretation¶
| Correlation | Meaning |
|---|---|
| +0.5 to +1.0 | Strong positive: species strongly associated with this variable |
| +0.2 to +0.5 | Moderate positive association |
| -0.2 to +0.2 | Weak or no association |
| -0.5 to -0.2 | Moderate negative association |
| -1.0 to -0.5 | Strong negative: species avoids areas with high values |
Example: Barn Swallow
A typical insectivorous migrant might show positive correlations with temperature, croplands, and grasslands, and negative correlations with elevation, evergreen forests, and snow/ice.