Using the Copernicus Marine backend#
This page is the hands-on guide to the earthlens CMEMS backend —
picking a dataset, building a download, and the trade-offs around
depth, output format, and the curated catalog. For background see the
Introduction; for credentials see
Authentication; the rendered API is on the
Reference page.
Install: the backend needs the Copernicus Marine SDK —
pip install earthlens[cmems](which addscopernicusmarine). TheEarthLensfacade imports without it;import earthlens.cmemsrequires it.
1. Find a dataset and its variables#
Curated rows live under earthlens.cmems.Catalog. Each entry binds a
CMEMS dataset_id to its variable short names, units, CF long-names,
cadence, domain, and temporal coverage:
from earthlens.cmems import Catalog
cat = Catalog()
"cmems_mod_glo_phy_my_0.083deg_P1D-m" in cat.datasets # True
ds = cat.get_dataset("cmems_mod_glo_phy_my_0.083deg_P1D-m")
ds.cadence # 'daily'
ds.domain # 'global'
sorted(ds.variables)
# ['so', 'thetao', 'uo', 'vo', 'zos']
cat.get_variable("cmems_mod_glo_phy_my_0.083deg_P1D-m", "thetao").units
# 'degrees_C'
Catalog().available_datasets is the live informational index of
every CMEMS dataset id the toolbox publishes (regenerate with
tools/cmems/refresh_cmems_catalog.py refresh); Catalog().datasets
is the curated map, a subset of it. Uncurated dataset ids still
work — the catalog lookup is a metadata convenience, not a gate.
Pass any id the toolbox's copernicusmarine.describe() recognises and
the backend issues the request unchanged.
2. Download#
from earthlens.cmems import CMEMS
cmems = CMEMS(
start="2020-01-01",
end="2020-01-07",
temporal_resolution="daily",
variables={
"cmems_mod_glo_phy_my_0.083deg_P1D-m": ["thetao", "so", "zos"],
},
lat_lim=[30.0, 36.0], # [lat_min, lat_max]
lon_lim=[-10.0, -4.0], # [lon_min, lon_max]
path="data/cmems",
service_username="YOUR_CMEMS_USERNAME",
service_password="YOUR_CMEMS_PASSWORD",
minimum_depth=0.0,
maximum_depth=200.0,
)
paths = cmems.download()
# -> [PosixPath('data/cmems/cmems_mod_glo_phy_my_0_083deg_P1D-m.nc')]
The request is {dataset_id: [variable, ...]}. One subset() call
runs per (dataset_id, variables) pair; the toolbox returns a single
NetCDF per pair covering the full requested space/time/depth window.
download() returns the absolute paths the toolbox actually wrote.
The same call works through the unified facade:
from earthlens import EarthLens
earthlens = EarthLens(
data_source="cmems",
start="2020-01-01",
end="2020-01-07",
variables={"cmems_mod_glo_phy_my_0.083deg_P1D-m": ["thetao", "so"]},
lat_lim=[30.0, 36.0],
lon_lim=[-10.0, -4.0],
path="data/cmems",
service_username="YOUR_CMEMS_USERNAME",
service_password="YOUR_CMEMS_PASSWORD",
)
earthlens.download()
The facade forwards every extra kwarg (service_username,
service_password, credentials_file, file_format, minimum_depth,
maximum_depth, overwrite) to the backend constructor unchanged.
3. Credentials#
service_username + service_password use a free Copernicus Marine
portal account (register at https://marine.copernicus.eu/register).
Three other credential sources are accepted; see
Authentication for the resolution order. In
short:
- explicit
service_username=/service_password=toCMEMS(...), COPERNICUSMARINE_SERVICE_USERNAME/COPERNICUSMARINE_SERVICE_PASSWORDenvironment variables (toolbox-native),- a saved
~/.copernicusmarine/.copernicusmarine-credentialsfrom a previouscopernicusmarine login, - an explicit
credentials_file=path (CI-friendly).
If none of those resolve, CMEMS(...) raises AuthenticationError
on first download attempt rather than blocking on the toolbox's
interactive prompt.
4. Depth axis#
CMEMS physics and biogeochemistry datasets are 4-D — the toolbox
returns a (time, depth, lat, lon) NetCDF for variables like thetao
/ so / chl. The CMEMS surface-only datasets (OSTIA SST, altimetry,
sea-ice concentration) carry only (time, lat, lon).
To clip the vertical axis at request time, pass minimum_depth /
maximum_depth in metres:
CMEMS(
...,
variables={"cmems_mod_glo_phy_my_0.083deg_P1D-m": ["thetao"]},
minimum_depth=0.0, # surface
maximum_depth=200.0, # to 200 m
)
For surface-only datasets the depth kwargs are silently ignored by the toolbox. For 4-D datasets, omitting them returns the full depth range — which can be 50+ levels and considerably more data than a surface-bound application needs.
5. File format#
The toolbox supports two on-disk shapes:
file_format="netcdf"(default) — one.ncper request. Single file, opens withxarray.open_dataset/pyramids.netcdf.NetCDF/netCDF4.Dataset. Right choice for small-to-medium subsets (regional, sub-decade).file_format="zarr"— one directory-store per request. Stores the same data as chunked Zarr suitable for very large requests (global, multi-decade) and lazy access. Read withxarray.open_zarrorpyramids.zarr.Zarr.
6. Post-process the returned NetCDF with pyramids#
The earthlens project's GIS backend is
pyramids; use it to
open the returned NetCDF rather than reaching for xarray directly:
from pathlib import Path
from pyramids.netcdf import NetCDF
paths = cmems.download()
nc = NetCDF.read_file(str(paths[0]), read_only=True)
list(nc.meta_data.variables) # ['time', 'lat', 'lon', 'depth', 'thetao', ...]
thetao_attrs = nc.meta_data.variables["thetao"]
thetao_attrs.long_name # 'Sea water potential temperature'
thetao_attrs.unit # 'degrees_C'
arr = nc.read_array("thetao") # numpy ndarray (time, depth, lat, lon)
nc.close()
Aggregating in one call#
CMEMS.download(aggregate=AggregationConfig(...)) reduces every subset
through pyramids.netcdf.NetCDF.reduce: any depth axis is collapsed
to a column mean (or pinned with level=), the time axis is then
windowed by the config's freq, and one GeoTIFF per
(variable, window) is written — the same output shape the ECMWF
backend produces.
from earthlens import AggregationConfig
cmems = CMEMS(
start="2020-01-01", end="2020-12-31",
temporal_resolution="daily",
variables={"cmems_mod_glo_phy_my_0.083deg_P1D-m": ["thetao"]},
lat_lim=[30.0, 36.0], lon_lim=[-10.0, -4.0],
path="data/cmems", minimum_depth=0.0, maximum_depth=200.0,
)
tifs = cmems.download(aggregate=AggregationConfig(freq="1MS", op="mean"))
# -> one monthly-mean GeoTIFF per (variable, month)
This requires a pyramids build that ships NetCDF.reduce (pyramids
PR #339 / the release carrying it); on older pyramids the call raises
NotImplementedError naming NetCDF.reduce, and you post-process the
returned NetCDF through pyramids.netcdf.NetCDF as shown above.
7. Curated catalog versus uncurated ids#
The curated catalog targets the highest-leverage marine datasets; ~600 toolbox-addressable ids are not curated. The two paths:
- Curated:
Catalog.get_variable(dataset_id, variable)gives youunits+long_name+ the flux/state marker without ever calling the toolbox. Useful for autocompletion, schema validation, and downstream tooling (aggregator, plot labels). - Uncurated: just pass the id.
copernicusmarine.subset()does not care whether earthlens curates the id; the catalog lookup is a metadata convenience. The download itself succeeds.
To promote an uncurated id into the bundled catalog without hand-writing the stanza, run:
which fetches describe(), emits the YAML stanza, appends it to the
routed per-domain file under catalog/ (and adds the id to
_index.yaml), then re-parses to fail loud on malformed YAML.
8. Common error modes#
AuthenticationError— no credentials available, or the toolbox rejected them. See Authentication.copernicusmarine.DatasetNotFound— the dataset id does not resolve (typo, renamed). Runtools/cmems/audit_cmems_datasets.py --strictto spot catalog drift against the live toolbox.copernicusmarine.VariableDoesNotExistInTheDataset— one of the variable short names is wrong. The audit tool catches this too (status: partial).copernicusmarine.CoordinatesOutOfDatasetBounds— the requested bbox falls outside the dataset's native domain (e.g. the Mediterranean reanalysis covers ~30-46°N, 5°W-37°E; requesting the Pacific raises). Clip the bbox to the dataset domain or pick a global product.