CF Conventions Module #

The cf module provides shared infrastructure for reading and writing CF (Climate and Forecast) convention attributes. It is used by both the structured NetCDF class and the unstructured UgridDataset class.

Key Capabilities #

Variable classification: Detect coordinate, data, mesh topology, and connectivity variables via classify_variables()
CRS handling: Convert between CF grid_mapping attributes and OGR SpatialReference via grid_mapping_to_srs() / srs_to_grid_mapping()
Attribute writing: Write CF-compliant attributes to GDAL MDArrays and root groups
Axis detection: Identify X/Y/Z/T axes from variable names, attributes, and units
Convention parsing: Parse Conventions attribute strings (e.g., "CF-1.8 UGRID-1.0")
Data masking: Apply valid_range, valid_min, valid_max masks
Flag decoding: Decode CF flag_values / flag_meanings

API Reference #

`pyramids.netcdf.cf.classify_variables(variables, dimensions)` #

Classify each variable's CF role by cross-referencing attributes.

Must be called AFTER all variables are collected.

Parameters:

Name	Type	Description	Default
`variables`	`dict[str, Any]`	Dict of `{name: VariableInfo}` from metadata.	required
`dimensions`	`dict[str, Any]`	Dict of `{name: DimensionInfo}` from metadata.	required

Returns:

Type	Description
`dict[str, str]`	Dict of `{variable_name: cf_role_string}`.

Source code in src/pyramids/netcdf/cf.py

def classify_variables(
    variables: dict[str, Any],
    dimensions: dict[str, Any],
) -> dict[str, str]:
    """Classify each variable's CF role by cross-referencing attributes.

    Must be called AFTER all variables are collected.

    Args:
        variables: Dict of ``{name: VariableInfo}`` from metadata.
        dimensions: Dict of ``{name: DimensionInfo}`` from metadata.

    Returns:
        Dict of ``{variable_name: cf_role_string}``.
    """
    dim_names: set[str] = set()
    for d in dimensions.values():
        dim_names.add(d.name)
        dim_names.add(d.full_name.lstrip("/"))

    bounds_vars: set[str] = set()
    cell_measure_vars: set[str] = set()
    ancillary_vars: set[str] = set()
    aux_coord_vars: set[str] = set()

    for var in variables.values():
        attrs = var.attributes
        bounds_ref = attrs.get("bounds")
        if isinstance(bounds_ref, str):
            bounds_vars.add(bounds_ref)
        cm = attrs.get("cell_measures")
        if isinstance(cm, str):
            for token in cm.replace(":", " ").split():
                if token not in ("area", "volume"):
                    cell_measure_vars.add(token)
        av = attrs.get("ancillary_variables")
        if isinstance(av, str):
            for token in av.split():
                ancillary_vars.add(token)
        coords = attrs.get("coordinates")
        if isinstance(coords, str):
            for token in coords.split():
                aux_coord_vars.add(token)

    roles: dict[str, str] = {}
    for name, var in variables.items():
        short_name = name.lstrip("/")
        attrs = var.attributes

        if "grid_mapping_name" in attrs:
            roles[name] = "grid_mapping"
        elif short_name in bounds_vars or name in bounds_vars:
            roles[name] = "bounds"
        elif short_name in cell_measure_vars or name in cell_measure_vars:
            roles[name] = "cell_measure"
        elif short_name in ancillary_vars or name in ancillary_vars:
            roles[name] = "ancillary"
        elif _is_mesh_topology(attrs):
            roles[name] = "mesh_topology"
        elif _is_connectivity(attrs):
            roles[name] = "connectivity"
        elif short_name in dim_names:
            roles[name] = "coordinate"
        elif short_name in aux_coord_vars or name in aux_coord_vars:
            roles[name] = "auxiliary_coordinate"
        else:
            roles[name] = "data"

    return roles

`pyramids.netcdf.cf.grid_mapping_to_srs(grid_mapping_name, params)` #

Convert CF grid_mapping attributes to an OGR SpatialReference.

Tries crs_wkt first (fast path). Falls back to reconstructing the SRS from individual CF parameters.

Parameters:

Name	Type	Description	Default
`grid_mapping_name`	`str`	CF `grid_mapping_name` attribute value.	required
`params`	`dict[str, Any]`	All attributes from the grid_mapping variable.	required

Returns:

Type	Description
`SpatialReference`	osr.SpatialReference: The reconstructed spatial reference.

Raises:

Type	Description
`ValueError`	If the grid_mapping_name is not supported and no `crs_wkt` is available.

Source code in src/pyramids/netcdf/cf.py

def grid_mapping_to_srs(
    grid_mapping_name: str,
    params: dict[str, Any],
) -> osr.SpatialReference:
    """Convert CF grid_mapping attributes to an OGR SpatialReference.

    Tries ``crs_wkt`` first (fast path). Falls back to reconstructing
    the SRS from individual CF parameters.

    Args:
        grid_mapping_name: CF ``grid_mapping_name`` attribute value.
        params: All attributes from the grid_mapping variable.

    Returns:
        osr.SpatialReference: The reconstructed spatial reference.

    Raises:
        ValueError: If the grid_mapping_name is not supported and
            no ``crs_wkt`` is available.
    """
    srs = osr.SpatialReference()

    crs_wkt = params.get("crs_wkt")
    if crs_wkt:
        srs.ImportFromWkt(crs_wkt)
    else:
        srs = _build_srs_from_cf_params(grid_mapping_name, params)

    return srs

`pyramids.netcdf.cf.srs_to_grid_mapping(srs)` #

Convert an OGR SpatialReference to CF grid_mapping name and params.

Returns the CF grid_mapping_name and a dict of CF projection parameters (including crs_wkt for interoperability). For geographic CRS (no projection), returns "latitude_longitude" with only ellipsoid parameters.

Parameters:

Name	Type	Description	Default
`srs`	`SpatialReference`	An OGR SpatialReference object.	required

Returns:

Type	Description
`tuple[str, dict[str, Any]]`	Tuple of `(grid_mapping_name, params_dict)`.

Source code in src/pyramids/netcdf/cf.py

def srs_to_grid_mapping(
    srs: osr.SpatialReference,
) -> tuple[str, dict[str, Any]]:
    """Convert an OGR SpatialReference to CF grid_mapping name and params.

    Returns the CF ``grid_mapping_name`` and a dict of CF projection
    parameters (including ``crs_wkt`` for interoperability). For
    geographic CRS (no projection), returns ``"latitude_longitude"``
    with only ellipsoid parameters.

    Args:
        srs: An OGR SpatialReference object.

    Returns:
        Tuple of ``(grid_mapping_name, params_dict)``.
    """
    params: dict[str, Any] = {}
    params["crs_wkt"] = srs.ExportToWkt()
    params["semi_major_axis"] = srs.GetSemiMajor()
    inv_flat = srs.GetInvFlattening()
    if inv_flat > 0:
        params["inverse_flattening"] = inv_flat

    proj_name = srs.GetAttrValue("PROJECTION")
    if proj_name is None:
        grid_mapping_name = "latitude_longitude"
    elif proj_name in _GDAL_PROJ_TO_CF:
        grid_mapping_name = _GDAL_PROJ_TO_CF[proj_name]
        params.update(_extract_proj_params(srs, proj_name))
    else:
        logger.warning(
            f"Projection '{proj_name}' is not in the CF grid mapping table. "
            f"Only crs_wkt will be written for CRS interoperability."
        )
        grid_mapping_name = "latitude_longitude"

    return grid_mapping_name, params

`pyramids.netcdf.cf.write_attributes_to_md_array(md_arr, attrs)` #

Write a dict of attributes to a GDAL MDArray.

Handles str, bool, int, float, and list values. Silently skips attributes that can't be written (GDAL limitation). Bool values are stored as int32 (1/0) since NetCDF has no boolean type.

Parameters:

Name	Type	Description	Default
`md_arr`	`MDArray`	The GDAL MDArray to write attributes to.	required
`attrs`	`dict[str, Any]`	Dict of attribute names to values.	required

Source code in src/pyramids/netcdf/cf.py

def write_attributes_to_md_array(
    md_arr: gdal.MDArray,
    attrs: dict[str, Any],
) -> None:
    """Write a dict of attributes to a GDAL MDArray.

    Handles str, bool, int, float, and list values. Silently skips
    attributes that can't be written (GDAL limitation). Bool values
    are stored as int32 (1/0) since NetCDF has no boolean type.

    Args:
        md_arr: The GDAL MDArray to write attributes to.
        attrs: Dict of attribute names to values.
    """
    _write_attrs(md_arr, attrs)

`pyramids.netcdf.cf.write_global_attributes(rg, attrs)` #

Write a dict of attributes to a GDAL root group.

Handles str, bool, int, float values. Bool values are stored as int32. Silently skips attributes that can't be written.

Parameters:

Name	Type	Description	Default
`rg`	`Group`	The GDAL root group to write attributes to.	required
`attrs`	`dict[str, Any]`	Dict of attribute names to values.	required

Source code in src/pyramids/netcdf/cf.py

def write_global_attributes(
    rg: gdal.Group,
    attrs: dict[str, Any],
) -> None:
    """Write a dict of attributes to a GDAL root group.

    Handles str, bool, int, float values. Bool values are stored
    as int32. Silently skips attributes that can't be written.

    Args:
        rg: The GDAL root group to write attributes to.
        attrs: Dict of attribute names to values.
    """
    _write_attrs(rg, attrs)

`pyramids.netcdf.cf.build_coordinate_attrs(dim_name, is_geographic=True)` #

Generate CF-compliant attributes for a coordinate variable.

Maps dimension names to the appropriate CF axis, standard_name, long_name, and units attributes based on whether the CRS is geographic or projected.

Dimension names are case-normalized (lowered) before matching, so "X", "x", and "Lon" all match the X-axis pattern.

Parameters:

Name	Type	Description	Default
`dim_name`	`str`	Dimension name (e.g. `"x"`, `"y"`, `"lat"`, `"lon"`, `"time"`). Case-insensitive.	required
`is_geographic`	`bool`	True if the CRS is geographic (lon/lat), False if projected (easting/northing in metres).	`True`

Returns:

Type	Description
`dict[str, str]`	Dict of CF attribute names to string values. Empty dict
`dict[str, str]`	if the dimension name is not recognized.

Source code in src/pyramids/netcdf/cf.py

def build_coordinate_attrs(
    dim_name: str,
    is_geographic: bool = True,
) -> dict[str, str]:
    """Generate CF-compliant attributes for a coordinate variable.

    Maps dimension names to the appropriate CF ``axis``,
    ``standard_name``, ``long_name``, and ``units`` attributes
    based on whether the CRS is geographic or projected.

    Dimension names are **case-normalized** (lowered) before
    matching, so ``"X"``, ``"x"``, and ``"Lon"`` all match the
    X-axis pattern.

    Args:
        dim_name: Dimension name (e.g. ``"x"``, ``"y"``, ``"lat"``,
            ``"lon"``, ``"time"``). Case-insensitive.
        is_geographic: True if the CRS is geographic (lon/lat),
            False if projected (easting/northing in metres).

    Returns:
        Dict of CF attribute names to string values. Empty dict
        if the dimension name is not recognized.
    """
    name_lower = dim_name.lower()
    attrs: dict[str, str] = {}

    if name_lower in ("x", "lon", "longitude"):
        attrs["axis"] = "X"
        if is_geographic:
            attrs["standard_name"] = "longitude"
            attrs["long_name"] = "longitude"
            attrs["units"] = "degrees_east"
        else:
            attrs["standard_name"] = "projection_x_coordinate"
            attrs["long_name"] = "x coordinate of projection"
            attrs["units"] = "m"
    elif name_lower in ("y", "lat", "latitude"):
        attrs["axis"] = "Y"
        if is_geographic:
            attrs["standard_name"] = "latitude"
            attrs["long_name"] = "latitude"
            attrs["units"] = "degrees_north"
        else:
            attrs["standard_name"] = "projection_y_coordinate"
            attrs["long_name"] = "y coordinate of projection"
            attrs["units"] = "m"
    elif name_lower in ("time", "t"):
        attrs["axis"] = "T"
        attrs["standard_name"] = "time"
        attrs["long_name"] = "time"
    elif name_lower in ("z", "lev", "level", "depth", "height"):
        attrs["axis"] = "Z"
        attrs["long_name"] = dim_name

    return attrs

`pyramids.netcdf.cf.detect_axis(name, attrs, units=None)` #

Detect CF axis type from a variable's attributes.

Applies heuristics in priority order: 1. Explicit axis attribute ("X", "Y", "Z", "T") 2. standard_name lookup against CF conventions 3. Unit string matching (degrees_north -> Y, etc.) 4. Variable name pattern matching (lat -> Y, lon -> X)

Parameters:

Name	Type	Description	Default
`name`	`str`	Variable or dimension short name.	required
`attrs`	`dict[str, Any]`	Variable attribute dictionary.	required
`units`	`str \| None`	Unit string (separate from attrs for flexibility).	`None`

Returns:

Type	Description
`str \| None`	One of `"X"`, `"Y"`, `"Z"`, `"T"`, or None.

Source code in src/pyramids/netcdf/cf.py

def detect_axis(
    name: str,
    attrs: dict[str, Any],
    units: str | None = None,
) -> str | None:
    """Detect CF axis type from a variable's attributes.

    Applies heuristics in priority order:
    1. Explicit ``axis`` attribute (``"X"``, ``"Y"``, ``"Z"``, ``"T"``)
    2. ``standard_name`` lookup against CF conventions
    3. Unit string matching (``degrees_north`` -> Y, etc.)
    4. Variable name pattern matching (``lat`` -> Y, ``lon`` -> X)

    Args:
        name: Variable or dimension short name.
        attrs: Variable attribute dictionary.
        units: Unit string (separate from attrs for flexibility).

    Returns:
        One of ``"X"``, ``"Y"``, ``"Z"``, ``"T"``, or None.
    """
    result: str | None = None

    axis = attrs.get("axis")
    if isinstance(axis, str) and axis.upper() in ("X", "Y", "Z", "T"):
        result = axis.upper()
    else:
        stdname = attrs.get("standard_name")
        if isinstance(stdname, str):
            result = _STDNAME_TO_AXIS.get(stdname.lower())

        if result is None:
            unit_str = units or attrs.get("units")
            if isinstance(unit_str, str):
                unit_lower = unit_str.lower().strip()
                if unit_lower in (
                    "degrees_north", "degree_north", "degree_n", "degrees_n"
                ):
                    result = "Y"
                elif unit_lower in (
                    "degrees_east", "degree_east", "degree_e", "degrees_e"
                ):
                    result = "X"
                elif "since" in unit_lower:
                    result = "T"

        if result is None:
            result = _NAME_PATTERNS.get(name.lower().strip())

    return result

`pyramids.netcdf.cf.parse_conventions(conventions_str)` #

Parse a Conventions global attribute string.

Logs a warning if the CF version is higher than the highest tested version (1.11).

Parameters:

Name	Type	Description	Default
`conventions_str`	`str \| None`	Space-separated conventions string, e.g. `"CF-1.8 UGRID-1.0 Deltares-0.10"`.	required

Returns:

Type	Description
`dict[str, str]`	Dict of `{convention_name: version_string}`.

Source code in src/pyramids/netcdf/cf.py

def parse_conventions(conventions_str: str | None) -> dict[str, str]:
    """Parse a Conventions global attribute string.

    Logs a warning if the CF version is higher than the highest
    tested version (``1.11``).

    Args:
        conventions_str: Space-separated conventions string, e.g.
            ``"CF-1.8 UGRID-1.0 Deltares-0.10"``.

    Returns:
        Dict of ``{convention_name: version_string}``.
    """
    result: dict[str, str] = {}
    if conventions_str:
        for token in conventions_str.split():
            if "-" in token:
                name, _, version = token.partition("-")
                result[name] = version
            else:
                result[token] = ""
        cf_version = result.get("CF")
        if cf_version is not None:
            try:
                parts = cf_version.split(".")
                tested_parts = _MAX_TESTED_CF_VERSION.split(".")
                if [int(p) for p in parts] > [int(p) for p in tested_parts]:
                    logger.warning(
                        f"CF version {cf_version} is newer than the "
                        f"highest tested version "
                        f"({_MAX_TESTED_CF_VERSION}). "
                        f"Some features may not be supported."
                    )
            except (ValueError, TypeError):
                pass
    return result

`pyramids.netcdf.cf.parse_cell_methods(cell_methods_str)` #

Parse a CF cell_methods attribute string.

Parameters:

Name	Type	Description	Default
`cell_methods_str`	`str`	CF cell_methods string, e.g. `"time: mean area: sum where land"`.	required

Returns:

Type	Description
`list[dict[str, str]]`	List of dicts with keys `"dimensions"`, `"method"`,
`list[dict[str, str]]`	and optionally `"where"` and `"over"`.

Source code in src/pyramids/netcdf/cf.py

def parse_cell_methods(cell_methods_str: str) -> list[dict[str, str]]:
    """Parse a CF ``cell_methods`` attribute string.

    Args:
        cell_methods_str: CF cell_methods string, e.g.
            ``"time: mean area: sum where land"``.

    Returns:
        List of dicts with keys ``"dimensions"``, ``"method"``,
        and optionally ``"where"`` and ``"over"``.
    """
    results: list[dict[str, str]] = []
    pattern = (
        r'(\w[\w\s]*?):\s+(\w+)'
        r'(?:\s+where\s+(\w+))?'
        r'(?:\s+over\s+(\w+))?'
    )
    for match in re.finditer(pattern, cell_methods_str):
        entry: dict[str, str] = {
            "dimensions": match.group(1).strip(),
            "method": match.group(2),
        }
        if match.group(3):
            entry["where"] = match.group(3)
        if match.group(4):
            entry["over"] = match.group(4)
        results.append(entry)
    return results

`pyramids.netcdf.cf.apply_valid_range_mask(arr, valid_min=None, valid_max=None, valid_range=None, fill_value=float('nan'))` #

Mask values outside the CF valid range.

Values below valid_min or above valid_max are replaced with fill_value.

Parameters:

Name	Type	Description	Default
`arr`	`Any`	Input numpy array.	required
`valid_min`	`float \| None`	Minimum valid value.	`None`
`valid_max`	`float \| None`	Maximum valid value.	`None`
`valid_range`	`tuple \| list \| None`	`[min, max]`. Overrides valid_min/max.	`None`
`fill_value`	`float`	Replacement value. Defaults to NaN.	`float('nan')`

Returns:

Type	Description
`Any`	A copy of `arr` with out-of-range values replaced.

Source code in src/pyramids/netcdf/cf.py

def apply_valid_range_mask(
    arr: Any,
    valid_min: float | None = None,
    valid_max: float | None = None,
    valid_range: tuple | list | None = None,
    fill_value: float = float("nan"),
) -> Any:
    """Mask values outside the CF valid range.

    Values below ``valid_min`` or above ``valid_max`` are replaced
    with ``fill_value``.

    Args:
        arr: Input numpy array.
        valid_min: Minimum valid value.
        valid_max: Maximum valid value.
        valid_range: ``[min, max]``. Overrides valid_min/max.
        fill_value: Replacement value. Defaults to NaN.

    Returns:
        A copy of ``arr`` with out-of-range values replaced.
    """
    if valid_range is not None:
        valid_min = valid_range[0]
        valid_max = valid_range[1]
    result = arr.astype(float).copy()
    if valid_min is not None:
        result[result < valid_min] = fill_value
    if valid_max is not None:
        result[result > valid_max] = fill_value
    return result

`pyramids.netcdf.cf.decode_flags(value, flag_values=None, flag_meanings=None, flag_masks=None)` #

Decode a CF flag value to human-readable label(s).

Supports three CF flag modes:

Mutually exclusive (flag_values + flag_meanings): Returns the single meaning matching the value.
Boolean / bit-field (flag_masks + flag_meanings): Returns a list of meanings for active bits.
Combined (flag_masks + flag_values + flag_meanings): Returns meanings where (value & mask) == flag_value.

Parameters:

Name	Type	Description	Default
`value`	`int`	The integer flag value to decode.	required
`flag_values`	`list \| None`	List of possible flag values (1:1 with meanings).	`None`
`flag_meanings`	`list[str] \| None`	List of human-readable meaning strings.	`None`
`flag_masks`	`list[int] \| None`	List of bit masks for boolean flags.	`None`

Returns:

Type	Description
`list[str]`	list[str]: List of matching meaning strings. Returns
`list[str]`	`["unknown"]` if no match or no meanings provided.

Source code in src/pyramids/netcdf/cf.py

def decode_flags(
    value: int,
    flag_values: list | None = None,
    flag_meanings: list[str] | None = None,
    flag_masks: list[int] | None = None,
) -> list[str]:
    """Decode a CF flag value to human-readable label(s).

    Supports three CF flag modes:

    1. **Mutually exclusive** (flag_values + flag_meanings):
       Returns the single meaning matching the value.
    2. **Boolean / bit-field** (flag_masks + flag_meanings):
       Returns a list of meanings for active bits.
    3. **Combined** (flag_masks + flag_values + flag_meanings):
       Returns meanings where ``(value & mask) == flag_value``.

    Args:
        value: The integer flag value to decode.
        flag_values: List of possible flag values (1:1 with meanings).
        flag_meanings: List of human-readable meaning strings.
        flag_masks: List of bit masks for boolean flags.

    Returns:
        list[str]: List of matching meaning strings. Returns
        ``["unknown"]`` if no match or no meanings provided.
    """
    result: list[str] = ["unknown"]

    if flag_meanings is None:
        pass
    elif flag_masks is not None and flag_values is not None:
        matched = [
            flag_meanings[i]
            for i in range(len(flag_meanings))
            if i < len(flag_masks) and i < len(flag_values)
            and (value & flag_masks[i]) == flag_values[i]
        ]
        if matched:
            result = matched
    elif flag_masks is not None:
        matched = [
            flag_meanings[i]
            for i in range(len(flag_meanings))
            if i < len(flag_masks) and (value & flag_masks[i]) != 0
        ]
        if matched:
            result = matched
    elif flag_values is not None:
        for i, fv in enumerate(flag_values):
            if fv == value and i < len(flag_meanings):
                result = [flag_meanings[i]]
                break

    return result

`pyramids.netcdf.cf.validate_cf(global_attrs, variables, dimensions)` #

Check for common CF compliance issues.

Returns a list of warning/error messages. An empty list means the dataset passes basic CF checks. This is NOT a full cfchecker replacement — it covers the most common issues.

Checks: 1. Conventions attribute present and contains "CF-" 2. Coordinate variables have units 3. Time coordinates have calendar

Limitation: Only checks dimension-coordinate variables (those whose name matches a dimension). Auxiliary coordinates referenced by the coordinates attribute on data variables are not validated.

Parameters:

Name	Type	Description	Default
`global_attrs`	`dict[str, Any]`	Root-level attributes dict.	required
`variables`	`dict[str, Any]`	Dict of `{name: VariableInfo}` from metadata.	required
`dimensions`	`dict[str, Any]`	Dict of `{name: DimensionInfo}` from metadata.	required

Returns:

Type	Description
`list[str]`	List of warning/error strings. Empty if compliant.

Source code in src/pyramids/netcdf/cf.py

def validate_cf(
    global_attrs: dict[str, Any],
    variables: dict[str, Any],
    dimensions: dict[str, Any],
) -> list[str]:
    """Check for common CF compliance issues.

    Returns a list of warning/error messages. An empty list means
    the dataset passes basic CF checks. This is NOT a full
    cfchecker replacement — it covers the most common issues.

    Checks:
    1. ``Conventions`` attribute present and contains ``"CF-"``
    2. Coordinate variables have ``units``
    3. Time coordinates have ``calendar``

    Limitation: Only checks dimension-coordinate variables (those
    whose name matches a dimension). Auxiliary coordinates referenced
    by the ``coordinates`` attribute on data variables are not
    validated.

    Args:
        global_attrs: Root-level attributes dict.
        variables: Dict of ``{name: VariableInfo}`` from metadata.
        dimensions: Dict of ``{name: DimensionInfo}`` from metadata.

    Returns:
        List of warning/error strings. Empty if compliant.
    """
    issues: list[str] = []

    conv = global_attrs.get("Conventions", "")
    if not isinstance(conv, str) or "CF-" not in conv:
        issues.append(
            "Missing or invalid 'Conventions' attribute. "
            "Should contain 'CF-1.X'."
        )

    dim_names = {d.name for d in dimensions.values()}
    for name, var in variables.items():
        short = name.lstrip("/")
        if short in dim_names:
            if not var.attributes.get("units") and not var.unit:
                issues.append(
                    f"Coordinate variable '{short}' has no 'units' attribute."
                )
            units_val = var.attributes.get("units", "")
            if isinstance(units_val, str) and "since" in units_val:
                if "calendar" not in var.attributes:
                    issues.append(
                        f"Time coordinate '{short}' has no 'calendar' attribute."
                    )

    return issues