Label Propagation¶
scripts/plot_propagation.py visualizes the effect of environmental neighbor label propagation — comparing species lists before and after propagation to sanity-check the results.
Usage¶
# Global summary + per-cell weekly breakdown for a location
python scripts/plot_propagation.py --data_path outputs/combined.parquet --lat 50.83 --lon 12.92
# Detailed species diff for a specific week
python scripts/plot_propagation.py --data_path outputs/combined.parquet --lat 50.83 --lon 12.92 --week 1
# Custom propagation parameters
python scripts/plot_propagation.py --data_path outputs/combined.parquet --lat 50.83 --lon 12.92 \
--propagate_k 10 --propagate_max_radius 1000 --propagate_min_obs 5
CLI Reference¶
| Flag | Default | Description |
|---|---|---|
--data_path |
(required) | Combined training parquet file |
--lat |
— | Latitude for per-cell comparison |
--lon |
— | Longitude for per-cell comparison |
--week |
— | Show detailed species diff for this week |
--taxonomy |
auto-detected | Taxonomy CSV for species name lookup |
--propagate_k |
10 |
Number of env-space neighbors |
--propagate_max_radius |
1000 |
Max geographic radius in km |
--propagate_min_obs |
10 |
Sparsity threshold (species count below which a sample receives propagated labels) |
--propagate_max_spread |
2.0 |
Species range expansion multiplier (0 = disable range check) |
--propagate_env_dist_max |
2.0 |
Max env-space distance for neighbor eligibility (0 = disabled) |
--propagate_range_cap |
500 |
Hard km ceiling on per-species propagation distance from nearest observation (0 = disabled) |
--no_yearly |
off | Exclude yearly (week 0) samples |
--outdir |
outputs/plots/propagation |
Output directory |
Output¶
Global Summary (propagation_summary.png)¶
A two-panel figure showing:
- Left: Histogram of species added per modified sample — shows the distribution of propagation intensity across all cells.
- Right: Total propagated species by week — reveals seasonal patterns (e.g. winter weeks may receive more propagation if fewer observations exist).
Per-Cell Weekly Comparison (propagation_weekly.png)¶
Requires --lat and --lon. A grouped bar chart showing species count before (blue) and after (orange) propagation for all 48 weeks at the nearest H3 cell. Weeks where propagation added species are annotated with +N.
Includes a summary box with total species added and number of weeks modified.
Terminal Species Diff¶
When --week is specified (along with --lat and --lon), prints a detailed species list diff to the terminal:
- Original species — species present before propagation
- Propagated species — species added by the propagation algorithm
When --week is omitted, prints a compact summary of all weeks that were modified at the selected cell.
What to Look For¶
Use this script to verify that propagation behaves correctly:
- Seasonal coherence — propagated species should be plausible for the target week. Barn Swallows appearing in week 1 (January) in Europe would indicate a bug in the week-matching logic.
- Geographic plausibility — propagated species should come from ecologically similar regions within the radius cap. Tropical species appearing in arctic cells suggests the radius or env-similarity matching is too loose.
- Proportionality — cells with very few original observations should gain the most species. Well-observed cells (above
min_obs) should remain unchanged. - No over-propagation — if a cell suddenly has hundreds of propagated species,
propagate_korpropagate_max_radiusmay be too high.
Tuning propagation parameters
If propagation adds implausible species:
- Lower
--propagate_max_radius(e.g. 500–1000 km) to restrict geographic reach - Lower
--propagate_k(e.g. 3) to use fewer neighbors - Raise
--propagate_min_obsto only propagate to very sparse cells
If propagation doesn't add enough species:
- Raise
--propagate_max_radius - Raise
--propagate_k - Lower
--propagate_min_obsto include more cells as propagation targets