Cloud Optimized GeoTIFFs with pyramids (offline)¶

A Cloud Optimized GeoTIFF (COG) is a tiled GeoTIFF with internal overviews laid out so an HTTP client can read just the pixels it needs via range requests. pyramids writes, inspects, validates, and partially reads COGs GDAL-native — no extra dependency.

This notebook is fully offline: it builds small rasters in a temp folder, so it runs anywhere with no network or cloud credentials.

Setup¶

Force a non-interactive matplotlib backend (defensive — this notebook does not plot) and build a temp workspace plus a couple of small rasters.

In [1]:

Copied!





%matplotlib inline

import tempfile
from pathlib import Path

import numpy as np

from pyramids.dataset import Dataset
from pyramids.dataset.cog import cog_info, validate

workdir = Path(tempfile.mkdtemp(prefix="pyramids-cog-"))

# A 600x600 float raster (large enough to carry internal overviews).
rng = np.random.default_rng(42)
arr = (rng.random((600, 600)) * 100).astype("float32")
ds = Dataset.create_from_array(arr, top_left_corner=(0, 10), cell_size=0.01, epsg=4326)
print(ds)
%matplotlib inline

import tempfile
from pathlib import Path

import numpy as np

from pyramids.dataset import Dataset
from pyramids.dataset.cog import cog_info, validate

workdir = Path(tempfile.mkdtemp(prefix="pyramids-cog-"))

# A 600x600 float raster (large enough to carry internal overviews).
rng = np.random.default_rng(42)
arr = (rng.random((600, 600)) * 100).astype("float32")
ds = Dataset.create_from_array(arr, top_left_corner=(0, 10), cell_size=0.01, epsg=4326)
print(ds)

2026-07-11 14:37:07 | INFO | pyramids.base.config | Logging is configured.

            Top Left Corner: (0.0, 10.0)
            Cell size: 0.01
            Dimension: 600 * 600
            EPSG: 4326
            Number of Bands: 1
            Band names: ['Band_1']
            Band colors: {0: 'undefined'}
            Band units: ['']
            Scale: [1.0]
            Offset: [0]
            Mask: -9999.0
            Data type: float32
            File:

Preview the source raster¶

Before writing a COG, plot the in-memory Dataset band to see the random float field we will encode.

In [2]:

Copied!

ds.plot(band=0, title="Source raster (pre-COG)")
ds.plot(band=0, title="Source raster (pre-COG)")

Out[2]:

<cleopatra.array_glyph.ArrayGlyph at 0x7fa8bdc0cad0>

No description has been provided for this image

Write a COG¶

to_cog applies pyramids' house defaults and resolves the predictor (float → 3) and the overview resampling (continuous float → average) from the source dtype.

In [3]:

Copied!





out = ds.to_cog(workdir / "scene.tif")
print("wrote", out)

# A named compression profile is a one-word shortcut:
out_zstd = ds.to_cog(workdir / "scene_zstd.tif", profile="zstd")
print("wrote", out_zstd)
out = ds.to_cog(workdir / "scene.tif")
print("wrote", out)

# A named compression profile is a one-word shortcut:
out_zstd = ds.to_cog(workdir / "scene_zstd.tif", profile="zstd")
print("wrote", out_zstd)

wrote /tmp/pyramids-cog-699znh9y/scene.tif
wrote /tmp/pyramids-cog-699znh9y/scene_zstd.tif

Inspect¶

cog_info() reads only headers/metadata (no pixels), so it is cheap even for a large remote COG. It reports the compression, predictor, blocksize, dtype, CRS/bounds, and the overview pyramid.

In [4]:

Copied!





scene = Dataset.read_file(out)
info = scene.cog_info()
print("compression:", info.compression)
print("predictor:  ", info.predictor)
print("blocksize:  ", info.blocksize)
print("dtype:      ", info.dtype, "bands:", info.band_count)
print("crs:        ", info.crs_epsg)
print("overviews:  ", [o.decimation for o in info.overviews])
scene = Dataset.read_file(out)
info = scene.cog_info()
print("compression:", info.compression)
print("predictor:  ", info.predictor)
print("blocksize:  ", info.blocksize)
print("dtype:      ", info.dtype, "bands:", info.band_count)
print("crs:        ", info.crs_epsg)
print("overviews:  ", [o.decimation for o in info.overviews])

compression: DEFLATE
predictor:   3
blocksize:   (512, 512)
dtype:       Float32 bands: 1
crs:         4326
overviews:   [2]

Plot a band of scene (the COG read back from disk) to confirm the pixels round-trip identically through the COG encoding.

In [5]:

Copied!

scene.plot(band=0, title="COG read back")
scene.plot(band=0, title="COG read back")

Out[5]:

<cleopatra.array_glyph.ArrayGlyph at 0x7fa970b78cd0>

Validate¶

validate (and Dataset.validate_cog) return a ValidationReport usable as a bool.

In [6]:

Copied!





report = validate(out)
print("is_valid:", report.is_valid)
print("errors:  ", report.errors)
print("is_cog (fast probe):", scene.is_cog)
report = validate(out)
print("is_valid:", report.is_valid)
print("errors:  ", report.errors)
print("is_cog (fast probe):", scene.is_cog)

is_valid: True
errors:   []
is_cog (fast probe): True

Partial / overview-decimated reads¶

The point of a COG is reading only what you need. Asking for a smaller output size makes GDAL serve the data from the nearest overview (over /vsicurl/, only the relevant byte ranges are fetched). read_tile(z, x, y) reads a Web-Mercator XYZ tile the same way.

In [7]:

Copied!





# Whole-image thumbnail (long edge <= 64 px):
thumb = scene.preview(max_size=64, band=0)
print("thumbnail shape:", thumb.shape)

# A geographic window, decimated to an explicit output size. The raster spans
# lon 0..6, lat 4..10 (top-left (0, 10), 0.01 deg cells, 600x600).
part = scene.read_part((1.0, 5.0, 3.0, 7.0), dst_width=128, dst_height=128, band=0)
print("window shape:   ", part.shape)

# Sample a single coordinate (reprojected from point_crs when needed):
value = scene.point(3.0, 7.0, point_crs=4326, band=0)
print("point value:    ", float(value))
# Whole-image thumbnail (long edge <= 64 px):
thumb = scene.preview(max_size=64, band=0)
print("thumbnail shape:", thumb.shape)

# A geographic window, decimated to an explicit output size. The raster spans
# lon 0..6, lat 4..10 (top-left (0, 10), 0.01 deg cells, 600x600).
part = scene.read_part((1.0, 5.0, 3.0, 7.0), dst_width=128, dst_height=128, band=0)
print("window shape:   ", part.shape)

# Sample a single coordinate (reprojected from point_crs when needed):
value = scene.point(3.0, 7.0, point_crs=4326, band=0)
print("point value:    ", float(value))

thumbnail shape: (64, 64)
window shape:    (128, 128)
point value:     56.311004638671875

Encode to bytes (in-memory)¶

to_cog_bytes returns the COG as a bytes buffer for direct object-store upload — no temp file. It accepts the same keywords as to_cog.

In [8]:

Copied!

blob = ds.to_cog_bytes(compress="DEFLATE")
print("bytes:", len(blob), "TIFF marker:", blob[:2])
# e.g. boto3: s3.put_object(Bucket=..., Key="scene.tif", Body=blob)
blob = ds.to_cog_bytes(compress="DEFLATE")
print("bytes:", len(blob), "TIFF marker:", blob[:2])
# e.g. boto3: s3.put_object(Bucket=..., Key="scene.tif", Body=blob)

bytes: 1518609 TIFF marker: b'II'

Band subset, dtype cast, NoData, tags & colour table¶

to_cog can pre-process the source in one call — select/reorder bands (indexes, 0-based), cast the dtype (out_dtype), set nodata, and attach band tags / a colour table / dataset metadata — all on an in-memory copy, so the original dataset is never mutated.

In [9]:

Copied!





# A 4-band float source.
multi = Dataset.create_from_array(
    rng.random((4, 64, 64)).astype("float32"),
    top_left_corner=(0, 10),
    cell_size=0.01,
    epsg=4326,
)
rgb = multi.to_cog(
    workdir / "rgb.tif",
    indexes=[2, 1, 0],  # select + reorder bands (0-based)
    out_dtype="int16",  # cast; predictor re-resolves to 2
    nodata=0,
    band_tags={0: {"name": "red"}},
    metadata={"source": "cog-basics-notebook"},
)
rgb_info = Dataset.read_file(rgb).cog_info()
print(
    "bands:",
    rgb_info.band_count,
    "dtype:",
    rgb_info.dtype,
    "predictor:",
    rgb_info.predictor,
)
# A 4-band float source.
multi = Dataset.create_from_array(
    rng.random((4, 64, 64)).astype("float32"),
    top_left_corner=(0, 10),
    cell_size=0.01,
    epsg=4326,
)
rgb = multi.to_cog(
    workdir / "rgb.tif",
    indexes=[2, 1, 0],  # select + reorder bands (0-based)
    out_dtype="int16",  # cast; predictor re-resolves to 2
    nodata=0,
    band_tags={0: {"name": "red"}},
    metadata={"source": "cog-basics-notebook"},
)
rgb_info = Dataset.read_file(rgb).cog_info()
print(
    "bands:",
    rgb_info.band_count,
    "dtype:",
    rgb_info.dtype,
    "predictor:",
    rgb_info.predictor,
)

bands: 3 dtype: Int16 predictor: 2

Command line¶

The pyramids cog command group exposes the same workflow from the shell (pyramids cog create|validate|info). The entry point is also callable in-process:

In [10]:

Copied!





from pyramids.cli import main

rc_info = main(["cog", "info", str(out)])
print("info exit code:", rc_info)
rc_val = main(["cog", "validate", str(out)])
print("validate exit code:", rc_val)
from pyramids.cli import main

rc_info = main(["cog", "info", str(out)])
print("info exit code:", rc_info)
rc_val = main(["cog", "validate", str(out)])
print("validate exit code:", rc_val)

file:        /tmp/pyramids-cog-699znh9y/scene.tif
is_cog:      True
driver:      GTiff
size:        600 x 600 (1 band(s))
dtype:       Float32
crs:         EPSG:4326
resolution:  (0.01, 0.01)
bounds:      (0.0, 4.0, 6.0, 10.0)
compression: DEFLATE
predictor:   3
blocksize:   (512, 512)
overviews:   1
  - level 0: 300 x 300 (1/2)
info exit code: 0
/tmp/pyramids-cog-699znh9y/scene.tif: valid COG
validate exit code: 0