utils.combine¶
Join H3 environmental grid with GBIF species observations.
Combine H3 geodata with processed GBIF observations.
Maps each GBIF record to its H3 cell and week, producing a combined parquet with per-week species lists and an accompanying taxonomy CSV.
Supports multiprocessing (--workers) to parallelize the expensive
per-row H3 cell computation across CPU cores.
Classes¶
Functions¶
estimate_gzip_rows(file_path, sample_rows=10000)
¶
Estimate total rows in a gzipped CSV by sampling compressed byte ratios.
combine_data(h3_path, gbif_path, output_path, resolution=None, workers=1, classes=None, taxonomy_path=None)
¶
Combine H3 environmental data with GBIF occurrences.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
h3_path
|
str
|
Path to H3 environmental GeoParquet. |
required |
gbif_path
|
str
|
Path to processed GBIF CSV. |
required |
output_path
|
str
|
Path for the combined output parquet. |
required |
resolution
|
int | None
|
H3 resolution for mapping GBIF coordinates.
If |
None
|
workers
|
int
|
Number of parallel worker processes. |
1
|
classes
|
list | None
|
Taxonomic classes to include. |
None
|
taxonomy_path
|
str | None
|
Path to taxonomy CSV for species code resolution. |
None
|