COGs with real data — an end-to-end workflow¶
This notebook exercises the full COG surface of pyramids against the real rasters that ship with the repository:
examples/data/geotiff/noah-precipitation-1979.tif— a global 4-band NOAH precipitation grid (EPSG:4326, 360×720), big enough to carry internal overviews.examples/data/dem/DEM5km_Rhine_burned_fill.tif— a Rhine-basin DEM in a projected CRS (EPSG:4647), used for the web-optimized / XYZ-tile examples.examples/data/geotiff/rhine/Qtot_*.tif— a 10-step daily runoff series, exported as a COG stack.
It is offline (no network) and runs in CI under pytest --nbval-lax.
Setup¶
import os
os.environ["MPLBACKEND"] = "Agg" # never open an interactive backend
import math
import tempfile
from pathlib import Path
from pyproj import Transformer
from pyramids.dataset import Dataset, DatasetCollection
from pyramids.dataset.cog import cog_info, validate
# Resolve the bundled example data whether the kernel runs from the notebook
# directory (CI) or the repo root.
DATA = Path("../../../examples/data")
if not DATA.exists():
DATA = Path("examples/data")
work = Path(tempfile.mkdtemp(prefix="pyramids-cog-real-"))
print("data:", DATA.resolve())
print("work:", work)
1. Write a real raster as a COG, then inspect & validate it¶
The NOAH precipitation grid is a 4-band float global raster. to_cog resolves the
predictor (float → 3) and overview resampling (continuous → average) from its dtype.
noah = Dataset.read_file(str(DATA / "geotiff" / "noah-precipitation-1979.tif"))
print("source:", noah.epsg, f"{noah.rows}x{noah.columns}", noah.band_count, "bands")
out = noah.to_cog(work / "noah_cog.tif")
info = Dataset.read_file(str(out)).cog_info()
print("compression:", info.compression, "predictor:", info.predictor)
print("blocksize: ", info.blocksize, "dtype:", info.dtype)
print("overviews: ", [o.decimation for o in info.overviews])
report = validate(str(out))
print("valid COG:", report.is_valid, "| errors:", report.errors)
2. Named compression profiles¶
zstd = noah.to_cog(work / "noah_zstd.tif", profile="zstd")
print("profile=zstd ->", Dataset.read_file(str(zstd)).cog_info().compression)
3. Partial / overview-decimated reads¶
Coordinates are derived from the raster's own bounds, so this works for any real extent. Requesting a smaller output size makes GDAL serve the data from the nearest overview.
minx, miny, maxx, maxy = noah.bbox
w, h = maxx - minx, maxy - miny
# Whole-image thumbnail (long edge <= 128 px):
thumb = noah.preview(max_size=128, band=0)
print("thumbnail:", thumb.shape)
# Central quarter window, decimated to 256x256:
sub = (minx + 0.25 * w, miny + 0.25 * h, minx + 0.75 * w, miny + 0.75 * h)
part = noah.read_part(sub, dst_width=256, dst_height=256, bbox_crs=noah.epsg, band=0)
print("window: ", part.shape)
# Sample the centre coordinate:
value = noah.point(minx + 0.5 * w, miny + 0.5 * h, point_crs=noah.epsg, band=0)
print("centre value:", float(value))
4. Encode to in-memory bytes (object-store upload)¶
blob = noah.to_cog_bytes(profile="zstd")
print("bytes:", len(blob), "| TIFF marker:", blob[:2])
# e.g. boto3: s3.put_object(Bucket=..., Key="noah.tif", Body=blob)
5. Band subset, dtype cast, NoData, tags & metadata¶
Take the first three of NOAH's four bands, cast to int16, set NoData, and attach tags — all on an in-memory copy, so the source dataset is never mutated.
rgb = noah.to_cog(
work / "noah_rgb.tif",
indexes=[0, 1, 2], # select + reorder bands (0-based)
out_dtype="int16", # predictor re-resolves to 2 for the cast
nodata=-9999,
band_tags={0: {"name": "precip_band_1"}},
metadata={"source": "NOAH-1979", "pipeline": "cog-real-data-notebook"},
)
rgb_info = Dataset.read_file(str(rgb)).cog_info()
print("bands:", rgb_info.band_count, "dtype:", rgb_info.dtype, "predictor:", rgb_info.predictor)
print("band-1 tag:", rgb_info.band_tags[1].get("name"))
6. Web-optimized COG (reproject to Web-Mercator)¶
The Rhine DEM is in a projected CRS (EPSG:4647). tiling_scheme="GoogleMapsCompatible"
reprojects to EPSG:3857 and aligns overviews to the Web-Mercator zoom grid, ready for a
tile server.
dem = Dataset.read_file(str(DATA / "dem" / "DEM5km_Rhine_burned_fill.tif"))
print("DEM source CRS:", dem.epsg)
web = dem.to_cog(work / "dem_web.tif", tiling_scheme="GoogleMapsCompatible")
print("web-optimized CRS:", Dataset.read_file(str(web)).epsg)
7. Read a Web-Mercator XYZ tile¶
Compute the slippy-map tile that contains the DEM's centre, then read it. read_tile
reprojects the tile's EPSG:3857 bounds back to the dataset CRS — no morecantile needed.
cx, cy = (dem.bbox[0] + dem.bbox[2]) / 2, (dem.bbox[1] + dem.bbox[3]) / 2
lon, lat = Transformer.from_crs(dem.epsg, 4326, always_xy=True).transform(cx, cy)
z = 6
n = 2 ** z
xt = int((lon + 180.0) / 360.0 * n)
yt = int((1.0 - math.asinh(math.tan(math.radians(lat))) / math.pi) / 2.0 * n)
print(f"DEM centre lon/lat = {lon:.2f}, {lat:.2f} -> tile z{z}/{xt}/{yt}")
tile = dem.read_tile(z, xt, yt, tilesize=256)
print("tile shape:", tile.shape)
8. Export a real time series as a COG stack¶
The rhine/ folder holds a 10-step daily runoff series. DatasetCollection.to_cog_stack
writes one COG per time slice.
dc = DatasetCollection.read_multiple_files(
str(DATA / "geotiff" / "rhine"), with_order=False
)
dc.open_multi_dataset(band=0)
paths = dc.to_cog_stack(
work / "rhine_cogs",
pattern="Qtot_{i:02d}.tif",
name="Qtot",
compress="DEFLATE",
)
print("slices written:", len(paths), "| time_length:", dc.time_length)
print("first slice is a valid COG:", validate(str(paths[0])).is_valid)
9. Command line¶
The same workflow from the shell via the pyramids cog command group (here driven
in-process).
from pyramids.cli import main
print("validate exit code:", main(["cog", "validate", str(out)]))
print("--- info ---")
main(["cog", "info", str(out)])