Using the Google Earth Engine backend#
This page is the hands-on guide to the earthlens GEE backend — picking
a dataset from the catalog, building a download, and the trade-offs
between the export modes. For background see the
Introduction; for credentials see
Registering a project and
Service account setup; the rendered API is
on the Reference page.
Install: the backend needs the Earth Engine SDK —
pip install earthlens[gee](which addsearthengine-api). TheEarthLensfacade imports without it;import earthlens.geerequires it.
1. Find a dataset and its bands#
The catalog (per-category src/earthlens/gee/catalog/*.yaml files,
loaded and merged by earthlens.gee.Catalog) maps Earth Engine asset
ids to their band and aggregation metadata —
shaped by Earth Engine's own model (a dataset is an image,
image_collection, or table; its addressable units are bands; each
band may carry a scale/offset, units, wavelength, value range):
from earthlens.gee import Catalog
cat = Catalog()
"USGS/SRTMGL1_003" in cat.datasets # True (a curated entry)
"COPERNICUS/S2_SR_HARMONIZED" in cat.available_datasets # True (in the index)
ds = cat.get_dataset("UCSB-CHG/CHIRPS/DAILY")
ds.ee_type # 'image_collection'
ds.cadence # Cadence(interval=1, unit='day')
ds.default_reducer # 'mean' (how a temporal composite collapses)
list(ds.bands) # ['precipitation']
cat.get_band("UCSB-CHG/CHIRPS/DAILY", "precipitation").units # 'mm/d'
available_datasets is the full index of asset ids Earth Engine
publishes (regenerated by tools/gee/refresh_gee_catalog.py); datasets is
the curated subset the package models in detail. tools/gee/audit_gee_datasets.py
reports which available_datasets entries are ready to be curated, and
tools/gee/refresh_gee_catalog.py --with-bands <id> prints a ready-to-paste
datasets: stanza for one.
2. Download#
from earthlens.gee import GEE
gee = GEE(
start="2020-06-01",
end="2020-08-31",
temporal_resolution="monthly", # one composite image per month
variables={"UCSB-CHG/CHIRPS/DAILY": ["precipitation"]},
lat_lim=[28.0, 32.0], # [lat_min, lat_max]
lon_lim=[30.0, 34.0], # [lon_min, lon_max]
path="data/gee",
scale=5566, # output pixel size in metres
service_account="my-sa@my-project.iam.gserviceaccount.com",
service_key="/path/to/key.json", # path, or the JSON content as a string
)
paths = gee.download()
# -> [PosixPath('data/gee/UCSB-CHG_CHIRPS_DAILY_precipitation_20200601.tif'),
# PosixPath('data/gee/UCSB-CHG_CHIRPS_DAILY_precipitation_20200701.tif'),
# PosixPath('data/gee/UCSB-CHG_CHIRPS_DAILY_precipitation_20200801.tif')]
The request is {asset_id: [band, ...]} — list every band you want from
each dataset (one image carries many; ERA5-Land alone has ~150).
download() returns one entry per (dataset, band-set, time-bucket): a
Path for export_via="url" (below), or a destination string for the
async exports.
Authentication#
service_account + service_key use a Google Cloud service-account key
(the recommended, headless-friendly path — see
Service account setup). The Cloud project is
read from the key file's project_id, or pass project= explicitly.
Without a key, pass project=<a registered project> and the backend
runs the interactive ee.Authenticate() once. A project that isn't
registered for Earth Engine, or that the service account lacks an IAM
role on, raises AuthenticationError with a pointer at the fix.
temporal_resolution#
"raw"(default) — one image: the whole[start, end]window collapsed with the dataset'sdefault_reducer."daily"/"monthly"/"yearly"— one image per day / month / year, each its sub-window collapsed with the reducer (meanfor rates and continuous fields,medianfor cloud-screened optical scenes,mosaicfor tiled or annual maps). Override per call withreducer="median"etc.
Static image datasets (e.g. USGS/SRTMGL1_003) ignore
temporal_resolution — they always yield a single image.
Region#
By default the clip is the lat/lon bbox (ee.Geometry.Rectangle). Pass
region=<GeoDataFrame> to clip to an exact polygon set (converted via
earthlens.gee.create_feature); the bbox is then used only for the
"url" size estimate.
3. Export modes (export_via)#
export_via |
How | Limits | Output |
|---|---|---|---|
"url" (default) |
Synchronous ee.Image.getDownloadURL → streamed download |
≤ 32768 px per axis (≈ (east−west)/(scale/111320)); roughly tens of MB |
a GeoTIFF in path/ |
"drive" |
Async ee.batch.Export.image.toDrive, polled to completion |
maxPixels (set to 1e13) — no 32768-px cap |
left in the Google Drive drive_folder (a "drive://…" string is returned) |
"gcs" |
Async ee.batch.Export.image.toCloudStorage, polled to completion |
as "drive" |
left in the gcs_bucket (a "gs://…" string is returned); the service account needs roles/storage.objectAdmin on the bucket |
If a "url" request would exceed the 32768-px limit, download()
raises a ValueError telling you the estimated width×height and to
use a coarser scale, a smaller bbox, or export_via="drive". For
large AOIs use "drive" / "gcs":
gee = GEE(
start="2023-01-01", end="2023-12-31", temporal_resolution="monthly",
variables={"COPERNICUS/S2_SR_HARMONIZED": ["B4", "B8"]},
lat_lim=[51.0, 53.0], lon_lim=[4.0, 7.0],
scale=10, export_via="drive", drive_folder="ee_exports",
service_account="my-sa@my-project.iam.gserviceaccount.com",
service_key="/path/to/key.json",
)
locations = gee.download() # blocks while the batch tasks run; pull the files from Drive
"gcs"writes to Cloud Storage, which incurs normal GCP storage/egress charges;"drive"and"url"do not (see the cost notes in the Introduction).
4. Via the EarthLens facade#
Once the GEE backend is registered in the facade you'll also be able to
do EarthLens(data_source="gee", variables={...}, ...).download(); until
then use earthlens.gee.GEE directly as above. (Tracking: plan task
H9.)