utils.geoutils¶
H3 grid construction and Earth Engine environmental feature sampling.
Utilities for building H3 grids and reducing Earth Engine imagery.
This module builds H3 indexes for a bbox or the globe and computes a
set of environmental properties per cell by sampling Earth Engine
datasets. It exposes compute_environmental_data and
run_global_in_chunks for chunked processing.
Functions¶
initialize_ee(service_account=None, key_file=None)
¶
Initialize Google Earth Engine.
Tries a standard client-side initialization first. If service_account and
key_file are provided the function will attempt service account auth.
h3_resolution_for_km(target_km)
¶
Return an H3 resolution that approximately matches the requested target cell size in kilometers.
Acceptable target_km values are 5, 10, or 25. The mapping is a pragmatic
choice that balances global coverage and dataset resolution.
build_h3_grid(resolution, bounds=None)
¶
Return a list of H3 indexes covering bounds at resolution.
If bounds is None the function defaults to the global bbox
(-180, -90, 180, 90). The result can be large for fine resolutions; use
cautiously.
compute_environmental_data(h3_indexes, scale=30, fields=None, use_centroid_sampling=True, chunk_size=200, threads=1)
¶
Compute environmental summaries for a list of H3 cells.
This function reduces a set of Earth Engine images per-H3 cell and
returns a GeoDataFrame with one row per h3_index. The reduction is
done using centroid sampling (faster) and runs in chunked mode to
avoid building large server-side feature collections.
Parameters
- h3_indexes: Iterable of H3 cell ids to process.
- scale: nominal reducer scale (meters).
- fields: list of datasets to reduce; defaults to all supported fields.
- threads: number of worker threads used for per-chunk concurrency.
Returns a geopandas.GeoDataFrame containing h3_index, geometry,
and the requested environmental columns.
export_geoparquet(gdf, out_path)
¶
Export GeoDataFrame to GeoParquet (Parquet with geometry).
The function writes a parquet file using PyArrow; ensure pyarrow and
a recent geopandas are installed.
fill_missing_with_nearest(gdf, columns=None)
¶
Fill missing values in gdf by copying the value from the nearest
neighbor that has a non-missing value.
columns: list of columns to process. If None, all non-geometry, non-id columns will be considered.
This uses centroids of geometries and a simple nearest-neighbor search implemented with NumPy; it's memory-efficient for moderate-sized GeoDataFrames.
combine_parquet_parts(parts_dir, out_path=None, pattern=None, remove_parts=False)
¶
Combine multiple chunked GeoParquet files into a single GeoParquet.
parts_dir: directory containing chunk parquet filesout_path: path to write the combined parquet. If None, a default filenamecombined.parquetinsideparts_diris used.pattern: glob pattern for matching part files (default: "grid_chunk.parquet").remove_parts: if True, delete the part files after successful combine.
Returns the out_path on success, or None if no parts found.
run_global_in_chunks(target_km=10, out_dir=None, bounds=None, threads=1, fill_missing=False, fraction=1.0)
¶
Chunk the H3 grid and run reductions per-chunk.
Behavior:
- Always uses centroid sampling for speed and simplicity.
- Fixed chunk size: 500 H3 cells per chunk.
- Uses a single outer tqdm progress bar for chunk completion.
Returns a list of written parquet file paths (one per chunk).