Skip to content

utils.geoutils

H3 grid construction and Earth Engine environmental feature sampling.

Utilities for building H3 grids and reducing Earth Engine imagery.

This module builds H3 indexes for a bbox or the globe and computes a set of environmental properties per cell by sampling Earth Engine datasets. It exposes compute_environmental_data and run_global_in_chunks for chunked processing.

Functions

initialize_ee(service_account=None, key_file=None)

Initialize Google Earth Engine.

Tries a standard client-side initialization first. If service_account and key_file are provided the function will attempt service account auth.

h3_resolution_for_km(target_km)

Return an H3 resolution that approximately matches the requested target cell size in kilometers.

Acceptable target_km values are 5, 10, or 25. The mapping is a pragmatic choice that balances global coverage and dataset resolution.

build_h3_grid(resolution, bounds=None)

Return a list of H3 indexes covering bounds at resolution.

If bounds is None the function defaults to the global bbox (-180, -90, 180, 90). The result can be large for fine resolutions; use cautiously.

compute_environmental_data(h3_indexes, scale=30, fields=None, use_centroid_sampling=True, chunk_size=200, threads=1)

Compute environmental summaries for a list of H3 cells.

This function reduces a set of Earth Engine images per-H3 cell and returns a GeoDataFrame with one row per h3_index. The reduction is done using centroid sampling (faster) and runs in chunked mode to avoid building large server-side feature collections.

Parameters - h3_indexes: Iterable of H3 cell ids to process. - scale: nominal reducer scale (meters). - fields: list of datasets to reduce; defaults to all supported fields. - threads: number of worker threads used for per-chunk concurrency.

Returns a geopandas.GeoDataFrame containing h3_index, geometry, and the requested environmental columns.

export_geoparquet(gdf, out_path)

Export GeoDataFrame to GeoParquet (Parquet with geometry).

The function writes a parquet file using PyArrow; ensure pyarrow and a recent geopandas are installed.

fill_missing_with_nearest(gdf, columns=None)

Fill missing values in gdf by copying the value from the nearest neighbor that has a non-missing value.

  • columns: list of columns to process. If None, all non-geometry, non-id columns will be considered.

This uses centroids of geometries and a simple nearest-neighbor search implemented with NumPy; it's memory-efficient for moderate-sized GeoDataFrames.

combine_parquet_parts(parts_dir, out_path=None, pattern=None, remove_parts=False)

Combine multiple chunked GeoParquet files into a single GeoParquet.

  • parts_dir: directory containing chunk parquet files
  • out_path: path to write the combined parquet. If None, a default filename combined.parquet inside parts_dir is used.
  • pattern: glob pattern for matching part files (default: "grid_chunk.parquet").
  • remove_parts: if True, delete the part files after successful combine.

Returns the out_path on success, or None if no parts found.

run_global_in_chunks(target_km=10, out_dir=None, bounds=None, threads=1, fill_missing=False, fraction=1.0)

Chunk the H3 grid and run reductions per-chunk.

Behavior: - Always uses centroid sampling for speed and simplicity. - Fixed chunk size: 500 H3 cells per chunk. - Uses a single outer tqdm progress bar for chunk completion.

Returns a list of written parquet file paths (one per chunk).