Cloud I/O — tile-window iteration + Cloud-Optimized GeoTIFF¶
Two helpers for processing continental-scale rasters without ever loading the full grid:
cloud_io.tile_windows(dataset, tile_rows, tile_cols, overlap)— generator yielding(row_off, col_off, n_rows, n_cols)windows in row-major order. Pass each window intoDataset.read_array(window=...)to stream the raster through any per-tile algorithm.cloud_io.write_cog(dataset, path, compress="deflate")— write a Cloud-Optimized GeoTIFF (tiled layout + internal overviews) ready fors3:/// HTTP range reads.
In [1]:
Copied!
import matplotlib.pyplot as plt
plt.rcParams["figure.dpi"] = 80
plt.rcParams["savefig.dpi"] = 80
import matplotlib.pyplot as plt
plt.rcParams["figure.dpi"] = 80
plt.rcParams["savefig.dpi"] = 80
In [2]:
Copied!
import numpy as np
from pyramids.dataset import Dataset
from digitalrivers.cloud_io import tile_windows, write_cog
# Build a 500×500 'continental' DEM (stand-in for a multi-million-cell raster).
rng = np.random.default_rng(seed=2026)
z = rng.uniform(0, 1000, size=(500, 500)).astype(np.float32)
ds = Dataset.create_from_array(
z, top_left_corner=(0.0, 0.0), cell_size=30.0,
epsg=32618, no_data_value=-9999.0,
)
print(f"Dataset: {ds.rows}×{ds.columns} cells = {ds.rows * ds.columns:,} cells")
import numpy as np
from pyramids.dataset import Dataset
from digitalrivers.cloud_io import tile_windows, write_cog
# Build a 500×500 'continental' DEM (stand-in for a multi-million-cell raster).
rng = np.random.default_rng(seed=2026)
z = rng.uniform(0, 1000, size=(500, 500)).astype(np.float32)
ds = Dataset.create_from_array(
z, top_left_corner=(0.0, 0.0), cell_size=30.0,
epsg=32618, no_data_value=-9999.0,
)
print(f"Dataset: {ds.rows}×{ds.columns} cells = {ds.rows * ds.columns:,} cells")
2026-05-17 22:36:53 | INFO | pyramids.base.config | Logging is configured.
Dataset: 500×500 cells = 250,000 cells
tile_windows — iterate without materialising the full grid¶
In [3]:
Copied!
windows = list(tile_windows(ds, tile_rows=128, tile_cols=128))
print(f"Tiles for a 500×500 grid at 128×128 tile size: {len(windows)}")
print(f"First 4 windows: {windows[:4]}")
print(f"Last window: {windows[-1]}")
total_cells = sum(w[2] * w[3] for w in windows)
assert total_cells == ds.rows * ds.columns
print(f"Sum of window cells = total raster cells: {total_cells}")
windows = list(tile_windows(ds, tile_rows=128, tile_cols=128))
print(f"Tiles for a 500×500 grid at 128×128 tile size: {len(windows)}")
print(f"First 4 windows: {windows[:4]}")
print(f"Last window: {windows[-1]}")
total_cells = sum(w[2] * w[3] for w in windows)
assert total_cells == ds.rows * ds.columns
print(f"Sum of window cells = total raster cells: {total_cells}")
Tiles for a 500×500 grid at 128×128 tile size: 16 First 4 windows: [(0, 0, 128, 128), (0, 128, 128, 128), (0, 256, 128, 128), (0, 384, 128, 116)] Last window: (384, 384, 116, 116) Sum of window cells = total raster cells: 250000
Visualise: tile grid overlaid on the dataset¶
In [4]:
Copied!
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
fig, ax = plt.subplots(figsize=(7, 6))
im = ax.imshow(z, cmap="terrain", origin="upper")
for r_off, c_off, n_r, n_c in windows:
ax.add_patch(Rectangle((c_off - 0.5, r_off - 0.5), n_c, n_r,
linewidth=1.5, edgecolor="red", facecolor="none"))
ax.set_title(f"Dataset {ds.rows}×{ds.columns} with {len(windows)} tile windows "
f"(128×128)")
ax.axis("off")
fig.colorbar(im, ax=ax, shrink=0.7)
fig.tight_layout()
plt.show()
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
fig, ax = plt.subplots(figsize=(7, 6))
im = ax.imshow(z, cmap="terrain", origin="upper")
for r_off, c_off, n_r, n_c in windows:
ax.add_patch(Rectangle((c_off - 0.5, r_off - 0.5), n_c, n_r,
linewidth=1.5, edgecolor="red", facecolor="none"))
ax.set_title(f"Dataset {ds.rows}×{ds.columns} with {len(windows)} tile windows "
f"(128×128)")
ax.axis("off")
fig.colorbar(im, ax=ax, shrink=0.7)
fig.tight_layout()
plt.show()
Per-tile computation example — global mean via streaming¶
Use the window iterator to compute the dataset-wide mean without ever reading more than one tile at a time.
In [5]:
Copied!
running_sum = 0.0
running_count = 0
for r_off, c_off, n_r, n_c in tile_windows(ds, tile_rows=128, tile_cols=128):
tile = ds.read_array()[r_off:r_off + n_r, c_off:c_off + n_c]
valid = tile != -9999.0
running_sum += float(tile[valid].sum())
running_count += int(valid.sum())
streaming_mean = running_sum / running_count
direct_mean = float(z[z != -9999.0].mean())
print(f"Streaming mean: {streaming_mean:.4f}")
print(f"Direct mean: {direct_mean:.4f}")
np.testing.assert_allclose(streaming_mean, direct_mean, rtol=1e-5)
running_sum = 0.0
running_count = 0
for r_off, c_off, n_r, n_c in tile_windows(ds, tile_rows=128, tile_cols=128):
tile = ds.read_array()[r_off:r_off + n_r, c_off:c_off + n_c]
valid = tile != -9999.0
running_sum += float(tile[valid].sum())
running_count += int(valid.sum())
streaming_mean = running_sum / running_count
direct_mean = float(z[z != -9999.0].mean())
print(f"Streaming mean: {streaming_mean:.4f}")
print(f"Direct mean: {direct_mean:.4f}")
np.testing.assert_allclose(streaming_mean, direct_mean, rtol=1e-5)
Streaming mean: 498.4214 Direct mean: 498.4214
Overlapping windows — neighbour-context streaming¶
For algorithms that need neighbour context (slopes, flow direction, dilations) pass
overlap=N so adjacent tiles share an N-cell halo.
In [6]:
Copied!
windows_overlap = list(tile_windows(ds, tile_rows=128, tile_cols=128, overlap=8))
print(f"Tiles with 8-cell overlap: {len(windows_overlap)}")
print(f"First 4 overlap windows: {windows_overlap[:4]}")
windows_overlap = list(tile_windows(ds, tile_rows=128, tile_cols=128, overlap=8))
print(f"Tiles with 8-cell overlap: {len(windows_overlap)}")
print(f"First 4 overlap windows: {windows_overlap[:4]}")
Tiles with 8-cell overlap: 25 First 4 overlap windows: [(0, 0, 128, 128), (0, 120, 128, 128), (0, 240, 128, 128), (0, 360, 128, 128)]
write_cog — Cloud-Optimized GeoTIFF¶
Tiled GeoTIFF + internal overviews. Compatible with HTTP range reads / s3:// reads from any
downstream cloud-aware GIS tool.
In [7]:
Copied!
import tempfile, os
with tempfile.TemporaryDirectory() as tmp:
cog_path = os.path.join(tmp, "continental.tif")
written = write_cog(ds, cog_path, compress="deflate")
size_mb = os.path.getsize(cog_path) / 1024 / 1024
print(f"Wrote COG: {written}")
print(f"File size: {size_mb:.2f} MB")
import tempfile, os
with tempfile.TemporaryDirectory() as tmp:
cog_path = os.path.join(tmp, "continental.tif")
written = write_cog(ds, cog_path, compress="deflate")
size_mb = os.path.getsize(cog_path) / 1024 / 1024
print(f"Wrote COG: {written}")
print(f"File size: {size_mb:.2f} MB")
Wrote COG: C:\Users\main\AppData\Local\Temp\tmpsakogkdn\continental.tif File size: 0.86 MB
Summary¶
tile_windowsis a generator — zero memory cost until you actually iterate. The streaming mean example demonstrates how a per-tile reduction can replace a full-grid load on continental rasters.overlap=Nsolves the neighbour-context problem for slopes / flow direction without changing the calling code.write_cogproduces a Cloud-Optimized GeoTIFF in one call — tiled layout + DEFLATE compression- internal overviews for HTTP-range /
s3://consumption.
- internal overviews for HTTP-range /