Using the Google Earth Engine backend #

This page is the hands-on guide to the earthlens GEE backend — picking a dataset from the catalog, building a download, and the trade-offs between the export modes. For background see the Introduction; for credentials see Registering a project and Service account setup; the rendered API is on the Reference page.

Install: the backend needs the Earth Engine SDK — pip install earthlens[gee] (which adds earthengine-api). The EarthLens facade imports without it; import earthlens.gee requires it.

1. Find a dataset and its bands #

The catalog (per-category src/earthlens/gee/catalog/*.yaml files, loaded and merged by earthlens.gee.Catalog) maps Earth Engine asset ids to their band and aggregation metadata — shaped by Earth Engine's own model (a dataset is an image, image_collection, or table; its addressable units are bands; each band may carry a scale/offset, units, wavelength, value range):

from earthlens.gee import Catalog

cat = Catalog()
"USGS/SRTMGL1_003" in cat.datasets          # True (a curated entry)
"COPERNICUS/S2_SR_HARMONIZED" in cat.available_datasets  # True (in the index)

ds = cat.get_dataset("UCSB-CHG/CHIRPS/DAILY")
ds.ee_type           # 'image_collection'
ds.cadence           # Cadence(interval=1, unit='day')
ds.default_reducer   # 'mean'  (how a temporal composite collapses)
list(ds.bands)       # ['precipitation']
cat.get_band("UCSB-CHG/CHIRPS/DAILY", "precipitation").units   # 'mm/d'

available_datasets is the full index of asset ids Earth Engine publishes (regenerated by tools/gee/refresh_gee_catalog.py); datasets is the curated subset the package models in detail. tools/gee/audit_gee_datasets.py reports which available_datasets entries are ready to be curated, and tools/gee/refresh_gee_catalog.py --with-bands <id> prints a ready-to-paste datasets: stanza for one.

2. Download #

from earthlens.gee import GEE

gee = GEE(
    start="2020-06-01",
    end="2020-08-31",
    temporal_resolution="monthly",       # one composite image per month
    variables={"UCSB-CHG/CHIRPS/DAILY": ["precipitation"]},
    lat_lim=[28.0, 32.0],                # [lat_min, lat_max]
    lon_lim=[30.0, 34.0],                # [lon_min, lon_max]
    path="data/gee",
    scale=5566,                          # output pixel size in metres
    service_account="my-sa@my-project.iam.gserviceaccount.com",
    service_key="/path/to/key.json",     # path, or the JSON content as a string
)
paths = gee.download()
# -> [PosixPath('data/gee/UCSB-CHG_CHIRPS_DAILY_precipitation_20200601.tif'),
#     PosixPath('data/gee/UCSB-CHG_CHIRPS_DAILY_precipitation_20200701.tif'),
#     PosixPath('data/gee/UCSB-CHG_CHIRPS_DAILY_precipitation_20200801.tif')]

The request is {asset_id: [band, ...]} — list every band you want from each dataset (one image carries many; ERA5-Land alone has ~150). download() returns one entry per (dataset, band-set, time-bucket): a Path for export_via="url" (below), or a destination string for the async exports.

Authentication #

service_account + service_key use a Google Cloud service-account key (the recommended, headless-friendly path — see Service account setup). The Cloud project is read from the key file's project_id, or pass project= explicitly. Without a key, pass project=<a registered project> and the backend runs the interactive ee.Authenticate() once. A project that isn't registered for Earth Engine, or that the service account lacks an IAM role on, raises AuthenticationError with a pointer at the fix.

`temporal_resolution`#

"raw" (default) — one image: the whole [start, end] window collapsed with the dataset's default_reducer.
"daily" / "monthly" / "yearly" — one image per day / month / year, each its sub-window collapsed with the reducer (mean for rates and continuous fields, median for cloud-screened optical scenes, mosaic for tiled or annual maps). Override per call with reducer="median" etc.

Static image datasets (e.g. USGS/SRTMGL1_003) ignore temporal_resolution — they always yield a single image.

Region #

By default the clip is the lat/lon bbox (ee.Geometry.Rectangle). Pass region=<GeoDataFrame> to clip to an exact polygon set (converted via earthlens.gee.create_feature); the bbox is then used only for the "url" size estimate.

3. Export modes (`export_via`)#

`export_via`	How	Limits	Output
`"url"` (default)	Synchronous `ee.Image.getDownloadURL` → streamed download	≤ 32768 px per axis (≈ `(east−west)/(scale/111320)`); roughly tens of MB	a GeoTIFF in `path/`
`"drive"`	Async `ee.batch.Export.image.toDrive`, polled to completion	`maxPixels` (set to 1e13) — no 32768-px cap	left in the Google Drive `drive_folder` (a `"drive://…"` string is returned)
`"gcs"`	Async `ee.batch.Export.image.toCloudStorage`, polled to completion	as `"drive"`	left in the `gcs_bucket` (a `"gs://…"` string is returned); the service account needs `roles/storage.objectAdmin` on the bucket

If a "url" request would exceed the 32768-px limit, download() raises a ValueError telling you the estimated width×height and to use a coarser scale, a smaller bbox, or export_via="drive". For large AOIs use "drive" / "gcs":

gee = GEE(
    start="2023-01-01", end="2023-12-31", temporal_resolution="monthly",
    variables={"COPERNICUS/S2_SR_HARMONIZED": ["B4", "B8"]},
    lat_lim=[51.0, 53.0], lon_lim=[4.0, 7.0],
    scale=10, export_via="drive", drive_folder="ee_exports",
    service_account="my-sa@my-project.iam.gserviceaccount.com",
    service_key="/path/to/key.json",
)
locations = gee.download()   # blocks while the batch tasks run; pull the files from Drive

"gcs" writes to Cloud Storage, which incurs normal GCP storage/egress charges; "drive" and "url" do not (see the cost notes in the Introduction).

4. Via the `EarthLens` facade #

Once the GEE backend is registered in the facade you'll also be able to do EarthLens(data_source="gee", variables={...}, ...).download(); until then use earthlens.gee.GEE directly as above. (Tracking: plan task H9.)