Cloud I/O#
Tile-window iteration and Cloud-Optimized GeoTIFF (COG) write helpers. Introduced as Phase-4 backfill P29 to give digital-rivers a Dask-style chunked-streaming story without requiring a Dask dependency outright.
Module-level functions#
Cloud-optimised raster I/O and chunked-tile streaming.
Two working halves ship today plus two umbrella stubs for the still-deferred features:
- :func:
tile_windows— chunked-iteration helper that yields GDAL-compatible(row_off, col_off, n_rows, n_cols)windows for streaming a continental DEM through any per-tile algorithm without materialising the full raster in memory. - :func:
write_cog— Cloud-Optimised GeoTIFF writer wrapping GDAL's built-in COG driver.
Deferred (umbrella raises NotImplementedError with a deferral note):
- :func:
dask_backend— full Dask-graph integration on top oftile_windows. - :func:
cloud_storage— Zarr / S3 / GCS read & write factories.
tile_windows(dataset, tile_rows=1024, tile_cols=1024, overlap=0)
#
Iterate (row_off, col_off, n_rows, n_cols) tile windows over a Dataset.
Yields one window per tile so callers can stream a continental DEM
through any per-tile algorithm without ever materialising the full
raster in memory. Each window is a GDAL-compatible
(xoff, yoff, xsize, ysize) quadruple ready to pass into
Dataset.read_array(window=...).
Tile size defaults match the COG / Cloud-Optimised GeoTIFF spec (512×512 or 1024×1024 internal tiles).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
A pyramids |
required | |
tile_rows
|
int
|
Tile height in cells. Defaults to 1024. |
1024
|
tile_cols
|
int
|
Tile width in cells. Defaults to 1024. |
1024
|
overlap
|
int
|
Cells of overlap between adjacent tiles. Useful for algorithms that need neighbour context (slopes, flow direction, dilations). Default 0. |
0
|
Yields:
| Type | Description |
|---|---|
|
|
|
|
order. Edge tiles are clipped to the dataset bounds. |
Examples:
-
Iterate a 5x5 dataset in 3x3 tiles with no overlap:
import numpy as np from pyramids.dataset import Dataset from digitalrivers.cloud_io import tile_windows ds = Dataset.create_from_array( ... np.zeros((5, 5), dtype=np.float32), ... top_left_corner=(0, 0), cell_size=1.0, epsg=4326, ... ) windows = list(tile_windows(ds, tile_rows=3, tile_cols=3)) [(w[0], w[1], w[2], w[3]) for w in windows][(0, 0, 3, 3), (0, 3, 3, 2), (3, 0, 2, 3), (3, 3, 2, 2)]
Source code in src/digitalrivers/cloud_io.py
dask_backend(*args, **kwargs)
#
Dask / chunked-tile backend for continental DEMs — umbrella stub.
Full Dask-graph integration remains deferred. The chunked-iteration
half ships as :func:tile_windows — callers process continental DEMs
by looping for win in tile_windows(ds): chunk = ds.read_array(window=win)
without loading the full mosaic in memory.
References
Dask documentation: https://docs.dask.org/ rioxarray chunked I/O.
Source code in src/digitalrivers/cloud_io.py
write_cog(dataset, path, compress='deflate')
#
Cloud-Optimised GeoTIFF writer.
Writes a pyramids Dataset to a COG file using GDAL's built-in COG
driver. COG is the standard cloud-native format for raster data:
internally tiled, internally overviewed, and indexable by HTTP range
requests — the foundation of every modern STAC-based pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
Any |
required | |
path
|
str
|
Output |
required |
compress
|
str
|
GDAL compression option ( |
'deflate'
|
Returns:
| Type | Description |
|---|---|
str
|
The output path on success. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If GDAL's COG driver fails (older GDAL builds may
need |
Examples:
-
Write a 5x5 DEM as a COG:
import numpy as np from pyramids.dataset import Dataset from digitalrivers.cloud_io import write_cog import tempfile, os arr = np.arange(25, dtype=np.float32).reshape(5, 5) ds = Dataset.create_from_array( ... arr, top_left_corner=(0, 0), cell_size=1.0, epsg=4326, ... ) with tempfile.TemporaryDirectory() as tmpdir: ... out_path = os.path.join(tmpdir, "out.tif") ... result = write_cog(ds, out_path) ... os.path.exists(result) True
Source code in src/digitalrivers/cloud_io.py
cloud_storage(*args, **kwargs)
#
Zarr / S3 / GCS factories — umbrella stub.
The COG write half is shipped under :func:write_cog. Zarr writers
and S3 / GCS read factories remain deferred pending a follow-up PR.
Source code in src/digitalrivers/cloud_io.py
Surface map#
| Function | Purpose |
|---|---|
tile_windows(dataset, tile_size, overlap=0) |
Generator yielding (row_slice, col_slice) windows for chunked I/O |
write_cog(dataset, path, compress="deflate") |
Write a pyramids Dataset as a Cloud-Optimized GeoTIFF (overviews + tile layout) |
dask_backend(*args, **kwargs) |
Umbrella stub — raises NotImplementedError with a pointer to tile_windows for current Dask interop |
cloud_storage(*args, **kwargs) |
Umbrella stub — raises NotImplementedError for cloud-storage adapters (s3://, gs://, …) |