Using the Climate Hazards Center (CHC) backend#
This page is the hands-on guide to the earthlens CHC backend —
picking a dataset from the catalog, building a download, and the
shape of the output. For background see the
Introduction; for catalog internals see
Catalog; the rendered API is on the
Reference page.
Install: the CHC backend has zero optional dependencies —
pip install earthlensis enough. FTP support lives in Python's stdlib (ftplib); the rest of the chain (pyramids,numpy,pandas,loguru,tqdm,joblib) is part of the core install.
1. Find a dataset and its variables#
The catalog (per-family src/earthlens/chc/catalog/*.yaml files,
loaded and merged by earthlens.chc.Catalog) maps dataset keys to
their FTP layout, spatial/temporal extent, and per-variable
metadata. Browse it from Python:
from earthlens.chc import Catalog
cat = Catalog()
len(cat.datasets) # 97 curated entries
"global-daily" in cat.datasets # True (CHIRPS-2.0 global daily)
"wbgt-monthly" in cat.datasets # True (WBGT)
# Filter by region or temporal resolution
cat.list_datasets(region="africa")
# ['africa-2-monthly', 'africa-3-monthly', 'africa-6-hourly',
# 'africa-daily', 'africa-dekad', 'africa-monthly', 'africa-pentad']
cat.list_datasets(temporal_resolution="monthly")
# ['africa-monthly', 'centennial-trends-v1-monthly',
# 'central-america-caribbean-monthly', 'chc-cmip6-...', ...]
# Inspect one row
ds = cat.get_dataset("global-daily")
ds.region # 'global'
ds.temporal_resolution # 'daily'
ds.pandas_freq # 'D'
ds.spatial_resolution # [0.05]
ds.lat_boundaries # [-50.0, 50.0]
ds.start_date # '1981-01-01'
list(ds.variables) # ['precipitation']
cat.get_variable("global-daily", "precipitation").units # 'mm/day'
cat.available_datasets is the informational walk-order list (matches
cat.datasets.keys() 1:1 after the H1 / centennial-trends fix).
cat.available_regions is a dict[str, dict[str, list[float]]]
mapping each named region to its lat_boundaries / lon_boundaries.
For a one-shot dump of a dataset's full metadata:
cat.describe("centennial-trends-v1-monthly")
# {'dataset': 'centennial-trends-v1-monthly',
# 'region': 'east-africa-centennial',
# 'temporal_resolution': 'monthly',
# 'pandas_freq': 'MS',
# 'spatial_resolution': [0.1],
# 'formats': ['netcdf'],
# 'ftp_bases': {'netcdf': 'pub/org/chc/products/CentennialTrends/'},
# 'file_patterns': None,
# 'discrete_files': {'netcdf': ['CenTrends_v1_monthly.nc']},
# 'is_discrete': True,
# 'lat_boundaries': [-12.25, 22.25],
# 'lon_boundaries': [21.25, 51.25],
# 'start_date': '1900-01-01',
# 'end_date': '2014-12-31',
# 'preliminary': False,
# 'variables': ['precipitation']}
2. Download#
Two variables= shapes are accepted:
Dict shape (any catalog dataset)#
The canonical shape, mirroring the ECMWF and GEE backends. The key is a catalog dataset key, the value is the list of variable codes to fetch from that dataset:
from earthlens.chc import CHIRPS
CHIRPS(
variables={"africa-pentad": ["precipitation"]},
start="2020-01-01", end="2020-02-01",
lat_lim=[-5.0, 5.0], lon_lim=[30.0, 40.0],
path="data/chc",
).download()
# -> writes data/chc/africa-pentad_precipitation_2020.01.{01,06,11,16,21,26}.tif
# plus data/chc/africa-pentad_precipitation_2020.02.01.tif
List shape (legacy, CHIRPS-2.0 only)#
For backwards compatibility with the original CHIRPS-2.0-global API.
The dataset key is derived from temporal_resolution via the
two-entry legacy table:
temporal_resolution |
resolved dataset key |
|---|---|
"daily" |
global-daily |
"monthly" |
global-monthly |
CHIRPS(
variables=["precipitation"],
temporal_resolution="daily",
start="2009-01-01", end="2009-01-31",
lat_lim=[4.0, 5.0], lon_lim=[-75.0, -74.0],
path="data/chc",
).download()
# -> writes data/chc/global-daily_precipitation_2009.01.{01..31}.tif
Any temporal_resolution value outside {"daily", "monthly"} paired
with the list shape raises ValueError — switch to the dict shape
for anything else.
3. Output filenames#
<root>/<dataset_key>_<variable>_<date>.tif, with the date suffix
following the dataset's cadence:
| Cadence | Date suffix |
|---|---|
annual |
YYYY |
monthly / monthly-climatology / 2-monthly / 3-monthly |
YYYY.MM |
6-hourly |
YYYY.MM.DD.HH |
| everything else (daily, dekadal, pentadal, 5-day, 10-day, 15-day, …) | YYYY.MM.DD |
Per-date GeoTIFFs are written after bbox-clipping with the actual raster geo-affine (no hardcoded 0.05° / ±50° assumption), so any CHC pixel size (0.05° CHIRPS, 0.1° CenTrends, 1° WBGT) is clipped correctly.
4. Parallel per-date downloads#
Pass cores= to use joblib for the date loop:
CHIRPS(
variables={"global-daily": ["precipitation"]},
start="2009-01-01", end="2009-12-31",
lat_lim=[4.0, 5.0], lon_lim=[-75.0, -74.0],
path="data/chc",
).download(cores=8)
cores=None (default) or cores=0 runs sequentially. Sequential
mode shares one anonymous FTP login across the whole date loop
(one login per (dataset, variable) batch instead of one per file);
parallel mode keeps per-file logins because joblib workers can't
share the unpicklable FTP socket.
5. Discrete-files datasets#
CHPclim v2 (a 12-file static climatology) and CenTrends v1 (multi-year
NetCDF archives) publish a fixed enumerated set of files, not a
per-date partition. The catalog declares them with discrete_files:
instead of file_patterns:; the backend iterates the file list once
per request rather than doing date substitution:
CHIRPS(
variables={"chpclim-v2-monthly": ["precipitation"]},
start="1981-01-01", end="1981-12-31", # ignored — CHPclim is static
lat_lim=[0.0, 30.0], lon_lim=[-20.0, 50.0],
path="data/chc",
).download()
# -> writes data/chc/chpclim-v2-monthly_CHPclim2.90-90.{01..12}.tif
# (12 files, one per calendar month; the date range is honoured
# only at the catalog-window-overlap check)
CenTrends multi-year NetCDFs are passed through unmodified — the 2-D
bbox-clip math doesn't apply to a (time, lat, lon) variable.
Read the saved file with xarray.open_dataset(...).sel(time=..., lat=..., lon=...).
6. Failure tolerance#
Per-date FTP failures (TCP reset, 550, brief DNS glitch, one bad
raster) do not abort the rest of the batch. They are accumulated
and surfaced as a single WARNING per (dataset, variable):
WARNING CHIRPS download for global-daily/precipitation ...
WARNING global-daily/precipitation: 3/365 dates failed; first 3:
2020-03-15 (FTPError), 2020-07-22 (TimeoutError), ...
INFO CHIRPS download summary: all 1 variables succeeded ...
If the broader (dataset, variable) call fails outright (catalog
resolution error, can't even open the FTP), download() logs it as
ERROR and continues with the next variable. Across a long batch the
final summary tells you what succeeded and what didn't.
7. Via the EarthLens facade#
The CHC backend is registered in the facade under two keys:
from earthlens import EarthLens
# Canonical key
EarthLens(
data_source="chc",
variables={"global-daily": ["precipitation"]},
start="2024-01-01", end="2024-01-07",
lat_lim=[0.0, 1.0], lon_lim=[0.0, 1.0],
path="data/chc",
).download()
# Back-compat alias (the package used to live at `earthlens.chirps`)
EarthLens(data_source="chirps", ...).download()
Both keys resolve to the same CHIRPS backend class. The "chirps"
alias is kept indefinitely so existing user code continues to work
after the chirps/ → chc/ rename.
8. Debugging#
- Empty output directory. Check the WARNING summary line for
per-date failures. The most common cause is a bbox that doesn't
overlap the requested dataset's region —
_clip_to_bboxraisesValueError(caught by the per-date guard and logged) naming both extents. - Every date fails with FTP 550. Either the file pattern in the
YAML is wrong (the GEFS v3 rows shipped with a known-bad pattern
and were withdrawn — see
gefs.yaml's header for the probe results) or your bbox / date is outside the dataset's actual coverage. TryCatalog().describe(ds_key)to compare the request's window against the dataset'sstart_date/end_date. - No outbound FTP allowed. Some corporate firewalls block port 21.
Run
python -c "from ftplib import FTP; FTP('data.chc.ucsb.edu').login()"from the target host to confirm reachability before reaching for the backend. - Inspect what the catalog ships: the Catalog page has the full breakdown of files, schema, and per-family contents.