Skip to content

CF Conventions Module#

The cf module provides shared infrastructure for reading and writing CF (Climate and Forecast) convention attributes. It is used by both the structured NetCDF class and the unstructured UgridDataset class.

Key Capabilities#

  • Variable classification: Detect coordinate, data, mesh topology, and connectivity variables via classify_variables()
  • CRS handling: Convert between CF grid_mapping attributes and OGR SpatialReference via grid_mapping_to_srs() / srs_to_grid_mapping()
  • Attribute writing: Write CF-compliant attributes to GDAL MDArrays and root groups
  • Axis detection: Identify X/Y/Z/T axes from variable names, attributes, and units
  • Convention parsing: Parse Conventions attribute strings (e.g., "CF-1.8 UGRID-1.0")
  • Data masking: Apply valid_range, valid_min, valid_max masks
  • Flag decoding: Decode CF flag_values / flag_meanings

API Reference#

pyramids.netcdf.cf.classify_variables(variables, dimensions) #

Classify each variable's CF role by cross-referencing attributes.

Must be called AFTER all variables are collected.

Parameters:

Name Type Description Default
variables dict[str, Any]

Dict of {name: VariableInfo} from metadata.

required
dimensions dict[str, Any]

Dict of {name: DimensionInfo} from metadata.

required

Returns:

Type Description
dict[str, str]

Dict of {variable_name: cf_role_string}.

Source code in src/pyramids/netcdf/cf.py
def classify_variables(
    variables: dict[str, Any],
    dimensions: dict[str, Any],
) -> dict[str, str]:
    """Classify each variable's CF role by cross-referencing attributes.

    Must be called AFTER all variables are collected.

    Args:
        variables: Dict of ``{name: VariableInfo}`` from metadata.
        dimensions: Dict of ``{name: DimensionInfo}`` from metadata.

    Returns:
        Dict of ``{variable_name: cf_role_string}``.
    """
    dim_names: set[str] = set()
    for d in dimensions.values():
        dim_names.add(d.name)
        dim_names.add(d.full_name.lstrip("/"))

    bounds_vars: set[str] = set()
    cell_measure_vars: set[str] = set()
    ancillary_vars: set[str] = set()
    aux_coord_vars: set[str] = set()

    for var in variables.values():
        attrs = var.attributes
        bounds_ref = attrs.get("bounds")
        if isinstance(bounds_ref, str):
            bounds_vars.add(bounds_ref)
        cm = attrs.get("cell_measures")
        if isinstance(cm, str):
            for token in cm.replace(":", " ").split():
                if token not in ("area", "volume"):
                    cell_measure_vars.add(token)
        av = attrs.get("ancillary_variables")
        if isinstance(av, str):
            for token in av.split():
                ancillary_vars.add(token)
        coords = attrs.get("coordinates")
        if isinstance(coords, str):
            for token in coords.split():
                aux_coord_vars.add(token)

    roles: dict[str, str] = {}
    for name, var in variables.items():
        short_name = name.lstrip("/")
        attrs = var.attributes

        if "grid_mapping_name" in attrs:
            roles[name] = "grid_mapping"
        elif short_name in bounds_vars or name in bounds_vars:
            roles[name] = "bounds"
        elif short_name in cell_measure_vars or name in cell_measure_vars:
            roles[name] = "cell_measure"
        elif short_name in ancillary_vars or name in ancillary_vars:
            roles[name] = "ancillary"
        elif _is_mesh_topology(attrs):
            roles[name] = "mesh_topology"
        elif _is_connectivity(attrs):
            roles[name] = "connectivity"
        elif short_name in dim_names:
            roles[name] = "coordinate"
        elif short_name in aux_coord_vars or name in aux_coord_vars:
            roles[name] = "auxiliary_coordinate"
        else:
            roles[name] = "data"

    return roles

pyramids.netcdf.cf.grid_mapping_to_srs(grid_mapping_name, params) #

Convert CF grid_mapping attributes to an OGR SpatialReference.

Tries crs_wkt first (fast path). Falls back to reconstructing the SRS from individual CF parameters.

Parameters:

Name Type Description Default
grid_mapping_name str

CF grid_mapping_name attribute value.

required
params dict[str, Any]

All attributes from the grid_mapping variable.

required

Returns:

Type Description
SpatialReference

osr.SpatialReference: The reconstructed spatial reference.

Raises:

Type Description
ValueError

If the grid_mapping_name is not supported and no crs_wkt is available.

Source code in src/pyramids/netcdf/cf.py
def grid_mapping_to_srs(
    grid_mapping_name: str,
    params: dict[str, Any],
) -> osr.SpatialReference:
    """Convert CF grid_mapping attributes to an OGR SpatialReference.

    Tries ``crs_wkt`` first (fast path). Falls back to reconstructing
    the SRS from individual CF parameters.

    Args:
        grid_mapping_name: CF ``grid_mapping_name`` attribute value.
        params: All attributes from the grid_mapping variable.

    Returns:
        osr.SpatialReference: The reconstructed spatial reference.

    Raises:
        ValueError: If the grid_mapping_name is not supported and
            no ``crs_wkt`` is available.
    """
    srs = osr.SpatialReference()

    crs_wkt = params.get("crs_wkt")
    if crs_wkt:
        srs.ImportFromWkt(crs_wkt)
    else:
        srs = _build_srs_from_cf_params(grid_mapping_name, params)

    return srs

pyramids.netcdf.cf.srs_to_grid_mapping(srs) #

Convert an OGR SpatialReference to CF grid_mapping name and params.

Returns the CF grid_mapping_name and a dict of CF projection parameters (including crs_wkt for interoperability). For geographic CRS (no projection), returns "latitude_longitude" with only ellipsoid parameters.

Parameters:

Name Type Description Default
srs SpatialReference

An OGR SpatialReference object.

required

Returns:

Type Description
tuple[str, dict[str, Any]]

Tuple of (grid_mapping_name, params_dict).

Source code in src/pyramids/netcdf/cf.py
def srs_to_grid_mapping(
    srs: osr.SpatialReference,
) -> tuple[str, dict[str, Any]]:
    """Convert an OGR SpatialReference to CF grid_mapping name and params.

    Returns the CF ``grid_mapping_name`` and a dict of CF projection
    parameters (including ``crs_wkt`` for interoperability). For
    geographic CRS (no projection), returns ``"latitude_longitude"``
    with only ellipsoid parameters.

    Args:
        srs: An OGR SpatialReference object.

    Returns:
        Tuple of ``(grid_mapping_name, params_dict)``.
    """
    params: dict[str, Any] = {}
    params["crs_wkt"] = srs.ExportToWkt()
    params["semi_major_axis"] = srs.GetSemiMajor()
    inv_flat = srs.GetInvFlattening()
    if inv_flat > 0:
        params["inverse_flattening"] = inv_flat

    proj_name = srs.GetAttrValue("PROJECTION")
    if proj_name is None:
        grid_mapping_name = "latitude_longitude"
    elif proj_name in _GDAL_PROJ_TO_CF:
        grid_mapping_name = _GDAL_PROJ_TO_CF[proj_name]
        params.update(_extract_proj_params(srs, proj_name))
    else:
        logger.warning(
            f"Projection '{proj_name}' is not in the CF grid mapping table. "
            f"Only crs_wkt will be written for CRS interoperability."
        )
        grid_mapping_name = "latitude_longitude"

    return grid_mapping_name, params

pyramids.netcdf.cf.write_attributes_to_md_array(md_arr, attrs) #

Write a dict of attributes to a GDAL MDArray.

Handles str, bool, int, float, and list values. Silently skips attributes that can't be written (GDAL limitation). Bool values are stored as int32 (1/0) since NetCDF has no boolean type.

Parameters:

Name Type Description Default
md_arr MDArray

The GDAL MDArray to write attributes to.

required
attrs dict[str, Any]

Dict of attribute names to values.

required
Source code in src/pyramids/netcdf/cf.py
def write_attributes_to_md_array(
    md_arr: gdal.MDArray,
    attrs: dict[str, Any],
) -> None:
    """Write a dict of attributes to a GDAL MDArray.

    Handles str, bool, int, float, and list values. Silently skips
    attributes that can't be written (GDAL limitation). Bool values
    are stored as int32 (1/0) since NetCDF has no boolean type.

    Args:
        md_arr: The GDAL MDArray to write attributes to.
        attrs: Dict of attribute names to values.
    """
    _write_attrs(md_arr, attrs)

pyramids.netcdf.cf.write_global_attributes(rg, attrs) #

Write a dict of attributes to a GDAL root group.

Handles str, bool, int, float values. Bool values are stored as int32. Silently skips attributes that can't be written.

Parameters:

Name Type Description Default
rg Group

The GDAL root group to write attributes to.

required
attrs dict[str, Any]

Dict of attribute names to values.

required
Source code in src/pyramids/netcdf/cf.py
def write_global_attributes(
    rg: gdal.Group,
    attrs: dict[str, Any],
) -> None:
    """Write a dict of attributes to a GDAL root group.

    Handles str, bool, int, float values. Bool values are stored
    as int32. Silently skips attributes that can't be written.

    Args:
        rg: The GDAL root group to write attributes to.
        attrs: Dict of attribute names to values.
    """
    _write_attrs(rg, attrs)

pyramids.netcdf.cf.build_coordinate_attrs(dim_name, is_geographic=True) #

Generate CF-compliant attributes for a coordinate variable.

Maps dimension names to the appropriate CF axis, standard_name, long_name, and units attributes based on whether the CRS is geographic or projected.

Dimension names are case-normalized (lowered) before matching, so "X", "x", and "Lon" all match the X-axis pattern.

Parameters:

Name Type Description Default
dim_name str

Dimension name (e.g. "x", "y", "lat", "lon", "time"). Case-insensitive.

required
is_geographic bool

True if the CRS is geographic (lon/lat), False if projected (easting/northing in metres).

True

Returns:

Type Description
dict[str, str]

Dict of CF attribute names to string values. Empty dict

dict[str, str]

if the dimension name is not recognized.

Source code in src/pyramids/netcdf/cf.py
def build_coordinate_attrs(
    dim_name: str,
    is_geographic: bool = True,
) -> dict[str, str]:
    """Generate CF-compliant attributes for a coordinate variable.

    Maps dimension names to the appropriate CF ``axis``,
    ``standard_name``, ``long_name``, and ``units`` attributes
    based on whether the CRS is geographic or projected.

    Dimension names are **case-normalized** (lowered) before
    matching, so ``"X"``, ``"x"``, and ``"Lon"`` all match the
    X-axis pattern.

    Args:
        dim_name: Dimension name (e.g. ``"x"``, ``"y"``, ``"lat"``,
            ``"lon"``, ``"time"``). Case-insensitive.
        is_geographic: True if the CRS is geographic (lon/lat),
            False if projected (easting/northing in metres).

    Returns:
        Dict of CF attribute names to string values. Empty dict
        if the dimension name is not recognized.
    """
    name_lower = dim_name.lower()
    attrs: dict[str, str] = {}

    if name_lower in ("x", "lon", "longitude"):
        attrs["axis"] = "X"
        if is_geographic:
            attrs["standard_name"] = "longitude"
            attrs["long_name"] = "longitude"
            attrs["units"] = "degrees_east"
        else:
            attrs["standard_name"] = "projection_x_coordinate"
            attrs["long_name"] = "x coordinate of projection"
            attrs["units"] = "m"
    elif name_lower in ("y", "lat", "latitude"):
        attrs["axis"] = "Y"
        if is_geographic:
            attrs["standard_name"] = "latitude"
            attrs["long_name"] = "latitude"
            attrs["units"] = "degrees_north"
        else:
            attrs["standard_name"] = "projection_y_coordinate"
            attrs["long_name"] = "y coordinate of projection"
            attrs["units"] = "m"
    elif name_lower in ("time", "t"):
        attrs["axis"] = "T"
        attrs["standard_name"] = "time"
        attrs["long_name"] = "time"
    elif name_lower in ("z", "lev", "level", "depth", "height"):
        attrs["axis"] = "Z"
        attrs["long_name"] = dim_name

    return attrs

pyramids.netcdf.cf.detect_axis(name, attrs, units=None) #

Detect CF axis type from a variable's attributes.

Applies heuristics in priority order: 1. Explicit axis attribute ("X", "Y", "Z", "T") 2. standard_name lookup against CF conventions 3. Unit string matching (degrees_north -> Y, etc.) 4. Variable name pattern matching (lat -> Y, lon -> X)

Parameters:

Name Type Description Default
name str

Variable or dimension short name.

required
attrs dict[str, Any]

Variable attribute dictionary.

required
units str | None

Unit string (separate from attrs for flexibility).

None

Returns:

Type Description
str | None

One of "X", "Y", "Z", "T", or None.

Source code in src/pyramids/netcdf/cf.py
def detect_axis(
    name: str,
    attrs: dict[str, Any],
    units: str | None = None,
) -> str | None:
    """Detect CF axis type from a variable's attributes.

    Applies heuristics in priority order:
    1. Explicit ``axis`` attribute (``"X"``, ``"Y"``, ``"Z"``, ``"T"``)
    2. ``standard_name`` lookup against CF conventions
    3. Unit string matching (``degrees_north`` -> Y, etc.)
    4. Variable name pattern matching (``lat`` -> Y, ``lon`` -> X)

    Args:
        name: Variable or dimension short name.
        attrs: Variable attribute dictionary.
        units: Unit string (separate from attrs for flexibility).

    Returns:
        One of ``"X"``, ``"Y"``, ``"Z"``, ``"T"``, or None.
    """
    result: str | None = None

    axis = attrs.get("axis")
    if isinstance(axis, str) and axis.upper() in ("X", "Y", "Z", "T"):
        result = axis.upper()
    else:
        stdname = attrs.get("standard_name")
        if isinstance(stdname, str):
            result = _STDNAME_TO_AXIS.get(stdname.lower())

        if result is None:
            unit_str = units or attrs.get("units")
            if isinstance(unit_str, str):
                unit_lower = unit_str.lower().strip()
                if unit_lower in (
                    "degrees_north", "degree_north", "degree_n", "degrees_n"
                ):
                    result = "Y"
                elif unit_lower in (
                    "degrees_east", "degree_east", "degree_e", "degrees_e"
                ):
                    result = "X"
                elif "since" in unit_lower:
                    result = "T"

        if result is None:
            result = _NAME_PATTERNS.get(name.lower().strip())

    return result

pyramids.netcdf.cf.parse_conventions(conventions_str) #

Parse a Conventions global attribute string.

Logs a warning if the CF version is higher than the highest tested version (1.11).

Parameters:

Name Type Description Default
conventions_str str | None

Space-separated conventions string, e.g. "CF-1.8 UGRID-1.0 Deltares-0.10".

required

Returns:

Type Description
dict[str, str]

Dict of {convention_name: version_string}.

Source code in src/pyramids/netcdf/cf.py
def parse_conventions(conventions_str: str | None) -> dict[str, str]:
    """Parse a Conventions global attribute string.

    Logs a warning if the CF version is higher than the highest
    tested version (``1.11``).

    Args:
        conventions_str: Space-separated conventions string, e.g.
            ``"CF-1.8 UGRID-1.0 Deltares-0.10"``.

    Returns:
        Dict of ``{convention_name: version_string}``.
    """
    result: dict[str, str] = {}
    if conventions_str:
        for token in conventions_str.split():
            if "-" in token:
                name, _, version = token.partition("-")
                result[name] = version
            else:
                result[token] = ""
        cf_version = result.get("CF")
        if cf_version is not None:
            try:
                parts = cf_version.split(".")
                tested_parts = _MAX_TESTED_CF_VERSION.split(".")
                if [int(p) for p in parts] > [int(p) for p in tested_parts]:
                    logger.warning(
                        f"CF version {cf_version} is newer than the "
                        f"highest tested version "
                        f"({_MAX_TESTED_CF_VERSION}). "
                        f"Some features may not be supported."
                    )
            except (ValueError, TypeError):
                pass
    return result

pyramids.netcdf.cf.parse_cell_methods(cell_methods_str) #

Parse a CF cell_methods attribute string.

Parameters:

Name Type Description Default
cell_methods_str str

CF cell_methods string, e.g. "time: mean area: sum where land".

required

Returns:

Type Description
list[dict[str, str]]

List of dicts with keys "dimensions", "method",

list[dict[str, str]]

and optionally "where" and "over".

Source code in src/pyramids/netcdf/cf.py
def parse_cell_methods(cell_methods_str: str) -> list[dict[str, str]]:
    """Parse a CF ``cell_methods`` attribute string.

    Args:
        cell_methods_str: CF cell_methods string, e.g.
            ``"time: mean area: sum where land"``.

    Returns:
        List of dicts with keys ``"dimensions"``, ``"method"``,
        and optionally ``"where"`` and ``"over"``.
    """
    results: list[dict[str, str]] = []
    pattern = (
        r'(\w[\w\s]*?):\s+(\w+)'
        r'(?:\s+where\s+(\w+))?'
        r'(?:\s+over\s+(\w+))?'
    )
    for match in re.finditer(pattern, cell_methods_str):
        entry: dict[str, str] = {
            "dimensions": match.group(1).strip(),
            "method": match.group(2),
        }
        if match.group(3):
            entry["where"] = match.group(3)
        if match.group(4):
            entry["over"] = match.group(4)
        results.append(entry)
    return results

pyramids.netcdf.cf.apply_valid_range_mask(arr, valid_min=None, valid_max=None, valid_range=None, fill_value=float('nan')) #

Mask values outside the CF valid range.

Values below valid_min or above valid_max are replaced with fill_value.

Parameters:

Name Type Description Default
arr Any

Input numpy array.

required
valid_min float | None

Minimum valid value.

None
valid_max float | None

Maximum valid value.

None
valid_range tuple | list | None

[min, max]. Overrides valid_min/max.

None
fill_value float

Replacement value. Defaults to NaN.

float('nan')

Returns:

Type Description
Any

A copy of arr with out-of-range values replaced.

Source code in src/pyramids/netcdf/cf.py
def apply_valid_range_mask(
    arr: Any,
    valid_min: float | None = None,
    valid_max: float | None = None,
    valid_range: tuple | list | None = None,
    fill_value: float = float("nan"),
) -> Any:
    """Mask values outside the CF valid range.

    Values below ``valid_min`` or above ``valid_max`` are replaced
    with ``fill_value``.

    Args:
        arr: Input numpy array.
        valid_min: Minimum valid value.
        valid_max: Maximum valid value.
        valid_range: ``[min, max]``. Overrides valid_min/max.
        fill_value: Replacement value. Defaults to NaN.

    Returns:
        A copy of ``arr`` with out-of-range values replaced.
    """
    if valid_range is not None:
        valid_min = valid_range[0]
        valid_max = valid_range[1]
    result = arr.astype(float).copy()
    if valid_min is not None:
        result[result < valid_min] = fill_value
    if valid_max is not None:
        result[result > valid_max] = fill_value
    return result

pyramids.netcdf.cf.decode_flags(value, flag_values=None, flag_meanings=None, flag_masks=None) #

Decode a CF flag value to human-readable label(s).

Supports three CF flag modes:

  1. Mutually exclusive (flag_values + flag_meanings): Returns the single meaning matching the value.
  2. Boolean / bit-field (flag_masks + flag_meanings): Returns a list of meanings for active bits.
  3. Combined (flag_masks + flag_values + flag_meanings): Returns meanings where (value & mask) == flag_value.

Parameters:

Name Type Description Default
value int

The integer flag value to decode.

required
flag_values list | None

List of possible flag values (1:1 with meanings).

None
flag_meanings list[str] | None

List of human-readable meaning strings.

None
flag_masks list[int] | None

List of bit masks for boolean flags.

None

Returns:

Type Description
list[str]

list[str]: List of matching meaning strings. Returns

list[str]

["unknown"] if no match or no meanings provided.

Source code in src/pyramids/netcdf/cf.py
def decode_flags(
    value: int,
    flag_values: list | None = None,
    flag_meanings: list[str] | None = None,
    flag_masks: list[int] | None = None,
) -> list[str]:
    """Decode a CF flag value to human-readable label(s).

    Supports three CF flag modes:

    1. **Mutually exclusive** (flag_values + flag_meanings):
       Returns the single meaning matching the value.
    2. **Boolean / bit-field** (flag_masks + flag_meanings):
       Returns a list of meanings for active bits.
    3. **Combined** (flag_masks + flag_values + flag_meanings):
       Returns meanings where ``(value & mask) == flag_value``.

    Args:
        value: The integer flag value to decode.
        flag_values: List of possible flag values (1:1 with meanings).
        flag_meanings: List of human-readable meaning strings.
        flag_masks: List of bit masks for boolean flags.

    Returns:
        list[str]: List of matching meaning strings. Returns
        ``["unknown"]`` if no match or no meanings provided.
    """
    result: list[str] = ["unknown"]

    if flag_meanings is None:
        pass
    elif flag_masks is not None and flag_values is not None:
        matched = [
            flag_meanings[i]
            for i in range(len(flag_meanings))
            if i < len(flag_masks) and i < len(flag_values)
            and (value & flag_masks[i]) == flag_values[i]
        ]
        if matched:
            result = matched
    elif flag_masks is not None:
        matched = [
            flag_meanings[i]
            for i in range(len(flag_meanings))
            if i < len(flag_masks) and (value & flag_masks[i]) != 0
        ]
        if matched:
            result = matched
    elif flag_values is not None:
        for i, fv in enumerate(flag_values):
            if fv == value and i < len(flag_meanings):
                result = [flag_meanings[i]]
                break

    return result

pyramids.netcdf.cf.validate_cf(global_attrs, variables, dimensions) #

Check for common CF compliance issues.

Returns a list of warning/error messages. An empty list means the dataset passes basic CF checks. This is NOT a full cfchecker replacement — it covers the most common issues.

Checks: 1. Conventions attribute present and contains "CF-" 2. Coordinate variables have units 3. Time coordinates have calendar

Limitation: Only checks dimension-coordinate variables (those whose name matches a dimension). Auxiliary coordinates referenced by the coordinates attribute on data variables are not validated.

Parameters:

Name Type Description Default
global_attrs dict[str, Any]

Root-level attributes dict.

required
variables dict[str, Any]

Dict of {name: VariableInfo} from metadata.

required
dimensions dict[str, Any]

Dict of {name: DimensionInfo} from metadata.

required

Returns:

Type Description
list[str]

List of warning/error strings. Empty if compliant.

Source code in src/pyramids/netcdf/cf.py
def validate_cf(
    global_attrs: dict[str, Any],
    variables: dict[str, Any],
    dimensions: dict[str, Any],
) -> list[str]:
    """Check for common CF compliance issues.

    Returns a list of warning/error messages. An empty list means
    the dataset passes basic CF checks. This is NOT a full
    cfchecker replacement — it covers the most common issues.

    Checks:
    1. ``Conventions`` attribute present and contains ``"CF-"``
    2. Coordinate variables have ``units``
    3. Time coordinates have ``calendar``

    Limitation: Only checks dimension-coordinate variables (those
    whose name matches a dimension). Auxiliary coordinates referenced
    by the ``coordinates`` attribute on data variables are not
    validated.

    Args:
        global_attrs: Root-level attributes dict.
        variables: Dict of ``{name: VariableInfo}`` from metadata.
        dimensions: Dict of ``{name: DimensionInfo}`` from metadata.

    Returns:
        List of warning/error strings. Empty if compliant.
    """
    issues: list[str] = []

    conv = global_attrs.get("Conventions", "")
    if not isinstance(conv, str) or "CF-" not in conv:
        issues.append(
            "Missing or invalid 'Conventions' attribute. "
            "Should contain 'CF-1.X'."
        )

    dim_names = {d.name for d in dimensions.values()}
    for name, var in variables.items():
        short = name.lstrip("/")
        if short in dim_names:
            if not var.attributes.get("units") and not var.unit:
                issues.append(
                    f"Coordinate variable '{short}' has no 'units' attribute."
                )
            units_val = var.attributes.get("units", "")
            if isinstance(units_val, str) and "since" in units_val:
                if "calendar" not in var.attributes:
                    issues.append(
                        f"Time coordinate '{short}' has no 'calendar' attribute."
                    )

    return issues