STAC — live cloud read (Earth Search, anonymous)¶
This notebook talks to a live STAC API — Element 84's Earth Search, an anonymous catalog of Sentinel-2 L2A scenes stored as Cloud-Optimized GeoTIFFs on AWS. No credentials are needed.
Network required. Because it hits a real endpoint, this notebook is excluded from the CI notebook suite (
pixi run -e dev notebooks) — the same way thedask-lazy-*cookbooks are. Run it locally withpip install 'pyramids-gis[stac]'.
The pattern throughout is lazy, windowed reads: we search the catalog, then
read only a small area-of-interest (AOI) window out of each multi-gigabyte COG
over /vsicurl/ range requests — never the full 10 980 × 10 980 band.
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from pyproj import Transformer
from pyramids.dataset import DatasetCollection
from pyramids.stac import (
build_vrt_from_stac,
load_asset,
read_extension_metadata,
search,
which_engine,
)
1. Search the catalog¶
search is a thin typed wrapper over pystac_client. It bounds the query
server-side (max_items, datetime, and an eo:cloud_cover filter), so
paging stops early.
AOI = (11.10, 46.05, 11.20, 46.12) # a small lon/lat box over the Dolomites, Italy
items = list(
search(
"https://earth-search.aws.element84.com/v1",
"sentinel-2-l2a",
bbox=AOI,
datetime="2023-07-01/2023-09-30",
query={"eo:cloud_cover": {"lt": 5}},
max_items=4,
)
)
[(it.id, round(it.properties["eo:cloud_cover"], 2)) for it in items]
2. Inspect an item's assets and metadata — no file I/O¶
read_extension_metadata reads the proj / raster / eo extension fields
straight out of the STAC JSON — CRS, geotransform, shape, band names — without
opening the asset. which_engine previews which reader load_asset would
dispatch to.
item = items[0]
print("asset keys:", sorted(item.assets)[:14])
print("red reader:", which_engine(item, "red"))
meta = read_extension_metadata(item, "red")
{k: meta[k] for k in ("crs", "shape", "band_names")}
3. An AOI window in the asset's native CRS¶
read_array(bbox=...) reads a window. Sentinel-2 COGs are tiled in UTM, so we
reproject our lon/lat point into the dataset's own CRS with pyproj and build a
square box around it (in metres). Reading in the native CRS keeps the window
exact — no resampling.
def utm_window(ds, lon, lat, half_m=1000):
"""A (W, S, E, N) box of side 2*half_m metres around (lon, lat), in ds's CRS."""
x, y = Transformer.from_crs(4326, ds.epsg, always_xy=True).transform(lon, lat)
return (x - half_m, y - half_m, x + half_m, y + half_m)
CENTER = (11.15, 46.08) # lon, lat
4. A true-colour thumbnail¶
We open the red / green / blue assets (three separate COGs), read the same
small window out of each, and stack them into an RGB composite. Only the window
is fetched — a few hundred kilobytes, not the whole scene.
channels = {}
for name in ("red", "green", "blue"):
band = load_asset(item, name)
channels[name] = band.read_array(bbox=utm_window(band, *CENTER, half_m=1500), epsg=band.epsg)
rgb = np.dstack([channels["red"], channels["green"], channels["blue"]]).astype("float32")
rgb = np.clip(rgb / 3000.0, 0, 1) # simple linear stretch for display
fig, ax = plt.subplots(figsize=(4, 4))
ax.imshow(rgb)
ax.set_title(f"{item.id}\n{item.properties['datetime'][:10]}", fontsize=8)
ax.axis("off")
fig.tight_layout()
print("composite shape:", rgb.shape)
5. A single-band window and its statistics¶
red = load_asset(item, "red")
arr = red.read_array(bbox=utm_window(red, *CENTER, half_m=1000), epsg=red.epsg)
print("window shape:", arr.shape)
print("min / mean / max:", int(np.nanmin(arr)), round(float(np.nanmean(arr)), 1), int(np.nanmax(arr)))
6. A lazy VRT mosaic across items — build_vrt_from_stac¶
build_vrt_from_stac stitches one asset across many items into a single lazy
GDAL VRT. GDAL reads the sources on demand, so building the VRT touches no pixel
data; the windowed read below pulls only the AOI.
vrt = build_vrt_from_stac(items, asset="red")
mosaic = vrt.read_array(bbox=utm_window(vrt, *CENTER, half_m=1000), epsg=vrt.epsg)
print("VRT epsg:", vrt.epsg, "| mosaic window:", mosaic.shape)
7. Describe a raster back out as a STAC Item — to_stac_item¶
Dataset.to_stac_item emits a STAC Item dict (a GeoJSON Feature with the proj
and raster extensions populated) — pystac not required.
item_dict = red.to_stac_item(
item.id,
asset_href=item.assets["red"].href,
datetime=item.properties["datetime"],
)
{k: item_dict["properties"][k] for k in ("datetime", "proj:code", "proj:shape")}
8. A time-stacked cube — DatasetCollection.from_stac¶
With a single asset key, from_stac builds one lazy timestep per item. Each
timestep is read on demand, so we can walk the cube and pull just the AOI window
out of every date to get a small time series.
cube = DatasetCollection.from_stac(items, asset="red")
print("timesteps:", cube.time_length)
series = []
for ds in cube.datasets:
win = ds.read_array(bbox=utm_window(ds, *CENTER, half_m=1000), epsg=ds.epsg)
mean = round(float(np.nanmean(win)), 1) if win.size else None
series.append(mean)
list(zip([it.properties["datetime"][:10] for it in items], series))
Recap¶
searchbounds the query at the API;read_extension_metadata/which_engineinspect items with zero file I/O.load_asset(...).read_array(bbox=...)streams just an AOI window from a remote COG. Reproject the AOI into the asset's native (UTM) CRS for an exact window.build_vrt_from_stacmosaics an asset lazily;from_stacstacks a single asset into a time cube.to_stac_itemwrites a raster back out as a STAC Item dict.
For credentialed catalogs see the Planetary Computer notebook; for a fully offline version see the offline STAC notebook and the STAC tutorial.