Cloud Optimized GeoTIFFs with pyramids (offline)¶
A Cloud Optimized GeoTIFF (COG) is a tiled GeoTIFF with internal overviews laid out so an HTTP client can read just the pixels it needs via range requests. pyramids writes, inspects, validates, and partially reads COGs GDAL-native — no extra dependency.
This notebook is fully offline: it builds small rasters in a temp folder, so it runs anywhere with no network or cloud credentials.
Setup¶
Force a non-interactive matplotlib backend (defensive — this notebook does not plot) and build a temp workspace plus a couple of small rasters.
import os
os.environ["MPLBACKEND"] = "Agg" # never open an interactive backend
import tempfile
from pathlib import Path
import numpy as np
from pyramids.dataset import Dataset
from pyramids.dataset.cog import cog_info, validate
workdir = Path(tempfile.mkdtemp(prefix="pyramids-cog-"))
# A 600x600 float raster (large enough to carry internal overviews).
rng = np.random.default_rng(42)
arr = (rng.random((600, 600)) * 100).astype("float32")
ds = Dataset.create_from_array(
arr, top_left_corner=(0, 10), cell_size=0.01, epsg=4326
)
print(ds)
Write a COG¶
to_cog applies pyramids' house defaults and resolves the predictor (float → 3) and the
overview resampling (continuous float → average) from the source dtype.
out = ds.to_cog(workdir / "scene.tif")
print("wrote", out)
# A named compression profile is a one-word shortcut:
out_zstd = ds.to_cog(workdir / "scene_zstd.tif", profile="zstd")
print("wrote", out_zstd)
Inspect¶
cog_info() reads only headers/metadata (no pixels), so it is cheap even for a large remote
COG. It reports the compression, predictor, blocksize, dtype, CRS/bounds, and the overview
pyramid.
scene = Dataset.read_file(str(out))
info = scene.cog_info()
print("compression:", info.compression)
print("predictor: ", info.predictor)
print("blocksize: ", info.blocksize)
print("dtype: ", info.dtype, "bands:", info.band_count)
print("crs: ", info.crs_epsg)
print("overviews: ", [o.decimation for o in info.overviews])
Validate¶
validate (and Dataset.validate_cog) return a ValidationReport usable as a bool.
report = validate(str(out))
print("is_valid:", report.is_valid)
print("errors: ", report.errors)
print("is_cog (fast probe):", scene.is_cog)
Partial / overview-decimated reads¶
The point of a COG is reading only what you need. Asking for a smaller output size makes
GDAL serve the data from the nearest overview (over /vsicurl/, only the relevant byte
ranges are fetched). read_tile(z, x, y) reads a Web-Mercator XYZ tile the same way.
# Whole-image thumbnail (long edge <= 64 px):
thumb = scene.preview(max_size=64, band=0)
print("thumbnail shape:", thumb.shape)
# A geographic window, decimated to an explicit output size. The raster spans
# lon 0..6, lat 4..10 (top-left (0, 10), 0.01 deg cells, 600x600).
part = scene.read_part((1.0, 5.0, 3.0, 7.0), dst_width=128, dst_height=128, band=0)
print("window shape: ", part.shape)
# Sample a single coordinate (reprojected from point_crs when needed):
value = scene.point(3.0, 7.0, point_crs=4326, band=0)
print("point value: ", float(value))
Encode to bytes (in-memory)¶
to_cog_bytes returns the COG as a bytes buffer for direct object-store upload — no temp
file. It accepts the same keywords as to_cog.
blob = ds.to_cog_bytes(compress="DEFLATE")
print("bytes:", len(blob), "TIFF marker:", blob[:2])
# e.g. boto3: s3.put_object(Bucket=..., Key="scene.tif", Body=blob)
Band subset, dtype cast, NoData, tags & colour table¶
to_cog can pre-process the source in one call — select/reorder bands (indexes, 0-based),
cast the dtype (out_dtype), set nodata, and attach band tags / a colour table / dataset
metadata — all on an in-memory copy, so the original dataset is never mutated.
# A 4-band float source.
multi = Dataset.create_from_array(
rng.random((4, 64, 64)).astype("float32"),
top_left_corner=(0, 10), cell_size=0.01, epsg=4326,
)
rgb = multi.to_cog(
workdir / "rgb.tif",
indexes=[2, 1, 0], # select + reorder bands (0-based)
out_dtype="int16", # cast; predictor re-resolves to 2
nodata=0,
band_tags={0: {"name": "red"}},
metadata={"source": "cog-basics-notebook"},
)
rgb_info = Dataset.read_file(str(rgb)).cog_info()
print("bands:", rgb_info.band_count, "dtype:", rgb_info.dtype, "predictor:", rgb_info.predictor)
Command line¶
The pyramids cog command group exposes the same workflow from the shell
(pyramids cog create|validate|info). The entry point is also callable in-process:
from pyramids.cli import main
rc_info = main(["cog", "info", str(out)])
print("info exit code:", rc_info)
rc_val = main(["cog", "validate", str(out)])
print("validate exit code:", rc_val)