Lazy UGRID reads for unstructured ocean / hydro meshes¶
UGRID is the CF-compliant convention for storing unstructured meshes in NetCDF — FVCOM, SCHISM, ADCIRC, SELFE, Delft3D FM all emit it. A regional ocean run at minute cadence easily generates ~40 GB / day of face-centred state variables; a full hindcast of Gulf of Maine at NECOFS 3-km resolution is > 1 TB. Going lazy is the only way to interactively explore these.
This notebook uses a real public UGRID dataset — the NECOFS Gulf of Maine forecast, hosted by UMass Dartmouth SMAST as OPeNDAP / NetCDF — to demonstrate the dask-backed read path on UgridDataset.
What you'll see¶
- Open a UGRID NetCDF lazily —
UgridDataset.read_file(path)+.read_array(variable, chunks=...). - Reduce across time (node-wise mean temperature over a month) without materialising the full mesh.
- Interpolate the unstructured mesh to a regular grid with
mesh_to_grid— useful for cross-checking against gridded models. - Write the interpolated cube as Zarr for downstream xarray consumers.
Requirements¶
pip install 'pyramids-gis[netcdf-lazy]'
Pulls xarray, netCDF4, h5netcdf, dask[array], zarr, fsspec (cftime is a core pyramids dependency). UGRID uses GDAL's NetCDF MDIM driver, which is already a hard pyramids dep.
Scheduler + cloud defaults¶
import os
os.environ['MPLBACKEND'] = 'Agg'
import dask
dask.config.set(scheduler="processes")
from pyramids import configure
# GDAL_HTTP_MULTIRANGE + VSI_CACHE are the ones that matter most
# for OPeNDAP / HTTP NetCDF reads.
configure(cloud_defaults=True)
1. Open NECOFS as a UgridDataset¶
NECOFS (Northeast Coastal Ocean Forecast System) publishes FVCOM output at http://www.smast.umassd.edu:8080/thredds/. The daily-forecast dataset NECOFS_GOM3_FORECAST.nc is a ~200 GB running archive; the variable we'll reduce — temp (face-centred water temperature on 53k triangles × 40 vertical layers × ~720 hourly time steps) — is ~3 GB per month-slice.
Pyramids' UgridDataset parses the mesh topology (node coords, face connectivity, edge list) eagerly because it's small (~10 MB); the data variables stay lazy.
from pyramids.netcdf.ugrid import UgridDataset
# FVCOM NECOFS GOM3 hindcast, via OPeNDAP. Fragment is deterministic;
# swap to a more-recent URL if SMAST rotates the forecast archive.
NECOFS = (
"http://www.smast.umassd.edu:8080/thredds/dodsC/"
"FVCOM/NECOFS/Forecasts/NECOFS_GOM3_FORECAST.nc"
)
ugrid = UgridDataset.read_file(NECOFS)
ugrid.n_node, ugrid.n_face, ugrid.data_variable_names[:5]
2. Lazy read of a data variable¶
UgridDataset.read_variable(name, chunks=...) returns a dask.array.Array shaped (time, sigma, face) for face-centred vars (or (time, node) for node-centred). The block shape inherited from NetCDF is typically one time step × one sigma level per chunk — perfect for time-reductions.
# Temperature on the face centroids — lazy.
temp = ugrid.read_variable("temp", chunks={"time": 24})
type(temp).__name__, temp.shape, temp.chunksize
# Month-long mean at the surface layer (sigma=0) — reduce across time.
# Only the chunks we touch get fetched over OPeNDAP.
import numpy as np
surface = temp[:720, 0, :] # first 30 days, surface sigma
monthly_mean = surface.mean(axis=0) # still lazy
monthly_mean_np = monthly_mean.compute()
monthly_mean_np.shape, float(np.nanmean(monthly_mean_np))
3. Interpolate mesh → regular grid¶
Unstructured-to-regular resampling is the classical "make it xarray-friendly" step. mesh_to_grid does barycentric / nearest-neighbour interpolation over the mesh connectivity; the resampled cube is a plain 2-D numpy array you can hand to cleopatra / matplotlib / rioxarray.
from pyramids.netcdf.ugrid.interpolation import mesh_to_grid
# Regular 0.05° grid covering the Gulf of Maine.
grid = mesh_to_grid(
mesh=ugrid.mesh,
values=monthly_mean_np,
cell_size=0.05,
bounds=(-71.0, 41.0, -66.5, 45.0),
method="nearest",
)
grid.shape, float(np.nanmin(grid)), float(np.nanmax(grid))
4. Save the interpolated cube as Zarr¶
For a multi-month workflow, you'd loop monthly_mean → grid, stack across time, and write the stack as Zarr. The partitioned write is parallel under the process scheduler configured above.
import tempfile
from pathlib import Path
from pyramids.dataset import Dataset
# Wrap as a Dataset so we get the CF-compliant Zarr writer.
ds = Dataset.create_from_array(
grid[np.newaxis, :, :].astype(np.float32), # add band dim
top_left_corner=(-71.0, 45.0),
cell_size=0.05,
epsg=4326,
)
store = Path(tempfile.mkdtemp()) / "necofs_monthly.zarr"
ds.to_zarr(str(store))
list(store.iterdir())[:4]
Notes on UGRID + dask¶
- Mesh topology is NOT lazy. The connectivity table (faces → nodes) is always materialised eagerly — it's tiny (~10 MB for 53k-face NECOFS) and every face-space operation needs it.
- Face-space reductions parallelise naturally. Mean / std / quantile over time or sigma layers are chunk-local.
- Spatial neighbourhood ops do NOT chunk cleanly on unstructured meshes. Gradient / divergence / smoothing need the full topology and are best materialised first (
.compute()) before running. - OPeNDAP is slower than S3. If you're running a long reduction, copy the source NetCDF to a local SSD / EBS volume and read from there — GDAL / xarray will still use the same code path.
Further reading¶
- UGRID Conventions v1.0 — the format spec
- FVCOM NECOFS forecasts — the dataset used above
- Xugrid — a complementary xarray-native UGRID reader; pyramids' UgridDataset wraps a subset of similar operations