Skip to content

utils.combine

Join H3 environmental grid with GBIF species observations.

Combine H3 geodata with processed GBIF observations.

Maps each GBIF record to its H3 cell and week, producing a combined parquet with per-week species lists and an accompanying taxonomy CSV.

Supports multiprocessing (--workers) to parallelize the expensive per-row H3 cell computation across CPU cores.

Classes

Functions

estimate_gzip_rows(file_path, sample_rows=10000)

Estimate total rows in a gzipped CSV by sampling compressed byte ratios.

combine_data(h3_path, gbif_path, output_path, resolution=None, workers=1, classes=None, taxonomy_path=None)

Combine H3 environmental data with GBIF occurrences.

Parameters:

Name Type Description Default
h3_path str

Path to H3 environmental GeoParquet.

required
gbif_path str

Path to processed GBIF CSV.

required
output_path str

Path for the combined output parquet.

required
resolution int | None

H3 resolution for mapping GBIF coordinates. If None (default), auto-detected from the H3 environmental data.

None
workers int

Number of parallel worker processes.

1
classes list | None

Taxonomic classes to include.

None
taxonomy_path str | None

Path to taxonomy CSV for species code resolution.

None