Skip to content

Assets: read, metadata, VRT, download, GeoParquet#

The asset-level surface of pyramids.stac: open a single asset, read its extension metadata without touching the file, mosaic an asset across items into a lazy VRT, download assets locally, and round-trip Items through GeoParquet.

  • Read one assetload_asset dispatches by media type (COG/GeoTIFF → Dataset, NetCDF/Zarr → NetCDF, GRIB → open_grib, JPEG2000 → Dataset); which_engine previews the reader without opening; resolved_href returns the (optionally signed) href without opening.
  • Extension metadataread_extension_metadata turns a STAC Item's proj / raster / eo fields into a grid + band-metadata dict (CRS, geotransform, shape, nodata/scale/offset, band names) without opening the asset, the way stackstac / odc-stac / rio-tiler do.
  • VRT mosaicbuild_vrt_from_stac stitches one asset across many items into a lazy GDAL VRT read on demand via /vsicurl/.
  • Downloaddownload_item copies assets to local files (optional stac-asset, shipped in the [stac] extra).
  • GeoParquetto_geoparquet / from_geoparquet serialize an ItemCollection to a single columnar file and back (optional pyarrow, the [parquet] extra).

Reading assets#

pyramids.stac._loader #

Open a STAC asset as a pyramids Dataset / NetCDF, dispatched by type.

Takes a STAC Item + asset_key (or an Asset directly), resolves the asset href, and opens it with the right GDAL-backed reader chosen by the asset's media_type (with the href extension as a fallback):

media_type / extension reader
image/tiff... / .tif .tiff :meth:Dataset.read_file
image/jp2 / .jp2 .jpx :meth:Dataset.read_file
application/x-netcdf / .nc .nc4 .cdf :meth:NetCDF.read_file
application/wmo-grib2 / .grib2 .grb :func:pyramids.grib.open_grib
application/vnd+zarr / .zarr :meth:NetCDF.read_file (GDAL Zarr)

Everything is duck-typed — pyramids does not import or depend on pystac; the Item / Asset contract is read via getattr + dict lookup (pystac.Asset has .href / .media_type; raw STAC JSON uses {"href":..., "type":...}). Assets resolve to pyramids' GDAL-backed wrappers.

which_engine(item_or_asset, asset_key=None) #

Return the reader name :func:load_asset would use, without opening.

Parameters:

Name Type Description Default
item_or_asset Any

A STAC Item or Asset (pystac object or raw dict).

required
asset_key str | None

Asset name when passing an Item; None for an Asset.

None

Returns:

Type Description
str

One of "gdal", "netcdf", "grib", "zarr".

Examples:

  • A COG asset dispatches to the GDAL reader:
    >>> from pyramids.stac import which_engine
    >>> asset = {
    ...     "href": "s3://bucket/scene.tif",
    ...     "type": "image/tiff; application=geotiff; profile=cloud-optimized",
    ... }
    >>> which_engine(asset)
    'gdal'
    
  • A GRIB2 asset (recognised by extension when type is absent):
    >>> which_engine({"href": "https://host/gfs.t00z.pgrb2.f000.grib2"})
    'grib'
    
  • An Item + asset key resolves the named asset:
    >>> item = {"assets": {"data": {"href": "x.nc", "type": "application/x-netcdf"}}}
    >>> which_engine(item, "data")
    'netcdf'
    
Source code in src/pyramids/stac/_loader.py
def which_engine(item_or_asset: Any, asset_key: str | None = None) -> str:
    """Return the reader name :func:`load_asset` would use, without opening.

    Args:
        item_or_asset: A STAC Item or Asset (pystac object or raw dict).
        asset_key: Asset name when passing an Item; `None` for an Asset.

    Returns:
        One of `"gdal"`, `"netcdf"`, `"grib"`, `"zarr"`.

    Examples:
        - A COG asset dispatches to the GDAL reader:
            ```python
            >>> from pyramids.stac import which_engine
            >>> asset = {
            ...     "href": "s3://bucket/scene.tif",
            ...     "type": "image/tiff; application=geotiff; profile=cloud-optimized",
            ... }
            >>> which_engine(asset)
            'gdal'

            ```
        - A GRIB2 asset (recognised by extension when type is absent):
            ```python
            >>> which_engine({"href": "https://host/gfs.t00z.pgrb2.f000.grib2"})
            'grib'

            ```
        - An Item + asset key resolves the named asset:
            ```python
            >>> item = {"assets": {"data": {"href": "x.nc", "type": "application/x-netcdf"}}}
            >>> which_engine(item, "data")
            'netcdf'

            ```
    """
    href, media_type = _resolve_asset(item_or_asset, asset_key)
    return _engine_for(media_type, href)

resolved_href(item_or_asset, asset_key=None, *, signer=None) #

Return an asset's resolved (optionally signed) href without opening it.

The read-free companion to :func:load_asset: it resolves the asset href and, when a signer is given, applies signer.sign_href — but never opens the asset. Useful for building a VRT over many assets (:func:pyramids.stac.build_vrt_from_stac), pre-flighting URLs, or debugging what load_asset would open.

Parameters:

Name Type Description Default
item_or_asset Any

A STAC Item (pystac object or raw dict) or an Asset.

required
asset_key str | None

Asset name when passing an Item; None for an Asset.

None
signer Any

Optional signer; when given, its sign_href rewrites the href (e.g. grafting a SAS token). gdal_env() is not applied — no read happens here.

None

Returns:

Type Description
str

The resolved asset href, signed when a signer is supplied.

Raises:

Type Description
StacAssetError

The asset is missing or has no href (subclasses :class:KeyError).

Examples:

  • Resolve a plain asset href:
    >>> from pyramids.stac import resolved_href
    >>> resolved_href({"href": "s3://b/scene.tif", "type": "image/tiff"})
    's3://b/scene.tif'
    
  • Resolve an item's asset and sign it with a simple signer:
    >>> class _S:
    ...     def sign_href(self, href):
    ...         return href + "?sig=tok"
    >>> item = {"assets": {"B04": {"href": "https://h/B04.tif"}}}
    >>> resolved_href(item, "B04", signer=_S())
    'https://h/B04.tif?sig=tok'
    
Source code in src/pyramids/stac/_loader.py
def resolved_href(
    item_or_asset: Any, asset_key: str | None = None, *, signer: Any = None
) -> str:
    """Return an asset's resolved (optionally signed) href without opening it.

    The read-free companion to :func:`load_asset`: it resolves the asset href
    and, when a `signer` is given, applies `signer.sign_href` — but never opens
    the asset. Useful for building a VRT over many assets
    (:func:`pyramids.stac.build_vrt_from_stac`), pre-flighting URLs, or
    debugging what `load_asset` would open.

    Args:
        item_or_asset: A STAC Item (pystac object or raw dict) or an Asset.
        asset_key: Asset name when passing an Item; `None` for an Asset.
        signer: Optional signer; when given, its `sign_href` rewrites the href
            (e.g. grafting a SAS token). `gdal_env()` is **not** applied — no
            read happens here.

    Returns:
        The resolved asset href, signed when a `signer` is supplied.

    Raises:
        StacAssetError: The asset is missing or has no href (subclasses
            :class:`KeyError`).

    Examples:
        - Resolve a plain asset href:
            ```python
            >>> from pyramids.stac import resolved_href
            >>> resolved_href({"href": "s3://b/scene.tif", "type": "image/tiff"})
            's3://b/scene.tif'

            ```
        - Resolve an item's asset and sign it with a simple signer:
            ```python
            >>> class _S:
            ...     def sign_href(self, href):
            ...         return href + "?sig=tok"
            >>> item = {"assets": {"B04": {"href": "https://h/B04.tif"}}}
            >>> resolved_href(item, "B04", signer=_S())
            'https://h/B04.tif?sig=tok'

            ```
    """
    href, _ = _resolve_asset(item_or_asset, asset_key)
    if signer is not None:
        href = signer.sign_href(href)
    return href

load_asset(item_or_asset, asset_key=None, *, signer=None, vsi=None) #

Open a STAC asset as a pyramids Dataset / NetCDF.

Resolves the asset href, optionally rewrites it through a signer (a :class:~pyramids.stac.signers.Signer), then opens it with the GDAL-backed reader chosen by media_type / extension. When a signer is given, both of its hooks are applied: signer.sign_href rewrites the href, and signer.gdal_env is installed as GDAL config for the duration of the open (via :class:~pyramids.base.remote.CloudConfig), so the underlying VSI handle is created with the right credentials / requester-pays knobs.

Parameters:

Name Type Description Default
item_or_asset Any

A STAC Item (pystac.Item or raw dict) or an Asset.

required
asset_key str | None

Asset name when passing an Item; None for an Asset.

None
signer Any

Optional signer. signer.sign_href(href) rewrites the href (e.g. grafting a SAS token) and signer.gdal_env() supplies GDAL config applied while the asset is opened (e.g. AWS_REQUEST_PAYER=requester for an :class:~pyramids.stac.signers.AWSRequesterPaysSigner, or an Authorization header for a :class:~pyramids.stac.signers.BearerTokenSigner). None leaves the href unchanged and applies no extra config.

None
vsi str | None

Optional explicit archive kind forwarded to the reader (e.g. a GeoTIFF/GRIB inside a .zip).

None

Returns:

Name Type Description
A Dataset

class:~pyramids.dataset.Dataset for COG/GeoTIFF assets, or a

Dataset

class:~pyramids.netcdf.NetCDF (a Dataset subclass) for

Dataset

NetCDF / Zarr / GRIB assets.

Raises:

Type Description
KeyError

The asset is missing or has no href.

ValueError

The asset's type/extension matches no supported reader.

Examples:

  • Open a COG asset from a STAC Item (requires network access):
    >>> from pyramids.stac import load_asset  # doctest: +SKIP
    >>> item = {"assets": {"B04": {"href": "s3://.../B04.tif",
    ...                            "type": "image/tiff; application=geotiff"}}}
    >>> ds = load_asset(item, "B04")  # doctest: +SKIP
    >>> ds.band_count  # doctest: +SKIP
    1
    
  • Sign the href with an MPC/CDSE-style bearer signer before opening (the token is installed as a GDAL Authorization header for the open):
    >>> from pyramids.stac import load_asset, BearerTokenSigner  # doctest: +SKIP
    >>> ds = load_asset(item, "B04", signer=BearerTokenSigner("tok"))  # doctest: +SKIP
    
  • Read a Requester-Pays bucket: the signer's gdal_env opts into AWS_REQUEST_PAYER=requester for the duration of the open:
    >>> from pyramids.stac import load_asset, AWSRequesterPaysSigner  # doctest: +SKIP
    >>> asset = {"href": "s3://usgs-landsat/collection02/.../B4.TIF",
    ...          "type": "image/tiff; application=geotiff"}
    >>> ds = load_asset(asset, signer=AWSRequesterPaysSigner(region="us-west-2"))  # doctest: +SKIP
    
Source code in src/pyramids/stac/_loader.py
def load_asset(
    item_or_asset: Any,
    asset_key: str | None = None,
    *,
    signer: Any = None,
    vsi: str | None = None,
) -> Dataset:
    """Open a STAC asset as a pyramids `Dataset` / `NetCDF`.

    Resolves the asset href, optionally rewrites it through a `signer`
    (a :class:`~pyramids.stac.signers.Signer`), then opens it with the
    GDAL-backed reader chosen by `media_type` / extension. When a signer is
    given, **both** of its hooks are applied: `signer.sign_href` rewrites the
    href, and `signer.gdal_env` is installed as GDAL config for the duration
    of the open (via :class:`~pyramids.base.remote.CloudConfig`), so the
    underlying VSI handle is created with the right credentials / requester-pays
    knobs.

    Args:
        item_or_asset: A STAC Item (pystac.Item or raw dict) or an Asset.
        asset_key: Asset name when passing an Item; `None` for an Asset.
        signer: Optional signer. `signer.sign_href(href)` rewrites the href
            (e.g. grafting a SAS token) and `signer.gdal_env()` supplies GDAL
            config applied while the asset is opened (e.g.
            `AWS_REQUEST_PAYER=requester` for an
            :class:`~pyramids.stac.signers.AWSRequesterPaysSigner`, or an
            `Authorization` header for a
            :class:`~pyramids.stac.signers.BearerTokenSigner`). `None` leaves
            the href unchanged and applies no extra config.
        vsi: Optional explicit archive kind forwarded to the reader (e.g. a
            GeoTIFF/GRIB inside a `.zip`).

    Returns:
        A :class:`~pyramids.dataset.Dataset` for COG/GeoTIFF assets, or a
        :class:`~pyramids.netcdf.NetCDF` (a `Dataset` subclass) for
        NetCDF / Zarr / GRIB assets.

    Raises:
        KeyError: The asset is missing or has no href.
        ValueError: The asset's type/extension matches no supported reader.

    Examples:
        - Open a COG asset from a STAC Item (requires network access):
            ```python
            >>> from pyramids.stac import load_asset  # doctest: +SKIP
            >>> item = {"assets": {"B04": {"href": "s3://.../B04.tif",
            ...                            "type": "image/tiff; application=geotiff"}}}
            >>> ds = load_asset(item, "B04")  # doctest: +SKIP
            >>> ds.band_count  # doctest: +SKIP
            1

            ```
        - Sign the href with an MPC/CDSE-style bearer signer before opening
          (the token is installed as a GDAL `Authorization` header for the
          open):
            ```python
            >>> from pyramids.stac import load_asset, BearerTokenSigner  # doctest: +SKIP
            >>> ds = load_asset(item, "B04", signer=BearerTokenSigner("tok"))  # doctest: +SKIP

            ```
        - Read a Requester-Pays bucket: the signer's `gdal_env` opts into
          `AWS_REQUEST_PAYER=requester` for the duration of the open:
            ```python
            >>> from pyramids.stac import load_asset, AWSRequesterPaysSigner  # doctest: +SKIP
            >>> asset = {"href": "s3://usgs-landsat/collection02/.../B4.TIF",
            ...          "type": "image/tiff; application=geotiff"}
            >>> ds = load_asset(asset, signer=AWSRequesterPaysSigner(region="us-west-2"))  # doctest: +SKIP

            ```
    """
    href, media_type = _resolve_asset(item_or_asset, asset_key)
    if signer is not None:
        href = signer.sign_href(href)
    engine = _engine_for(media_type, href)
    with signer_cloud_config(signer):
        if engine == "grib":
            result: Dataset = open_grib(href, vsi=vsi)
        elif engine in ("netcdf", "zarr"):
            result = NetCDF.read_file(href)
        else:
            result = Dataset.read_file(href, vsi=vsi)
    return result

Extension metadata (proj / raster / eo)#

pyramids.stac._extensions #

Read STAC proj / raster / eo extension metadata (PB-1).

This module builds a cube/grid skeleton — CRS, geotransform, shape, per-band nodata / scale / offset, band names — directly from the STAC Item JSON, without opening any asset: pure dict reads (via :func:pyramids.stac._item.asset_field, no pystac dependency) that yield a grid/band-metadata dict downstream code can use to build a VRT (PB-5), a multi-asset cube (PB-2), or a grid match (PC-2) without a header open.

Scope note: these are readers only. They deliberately do not stamp the metadata onto a :class:~pyramids.dataset.Dataset returned by :func:pyramids.stac.load_asset, because that reader opens assets read-only (remote /vsicurl COGs cannot be opened for write), and mutating a read-only GDAL handle (SetProjection / SetNoDataValue / SetScale) raises under gdal.UseExceptions(). Writable consumers (VRT/stack builders) apply the metadata themselves from the dict this module returns.

parse_number(value, default=None) #

Coerce a STAC numeric field to a float, honouring nan/inf strings.

The raster extension allows non-finite nodata values to be encoded as the strings "nan" / "inf" / "-inf".

Parameters:

Name Type Description Default
value Any

The raw field value (number, numeric string, nan/inf string, or None).

required
default Any

Returned when value is None or cannot be parsed.

None

Returns:

Type Description
Any

A float for numeric / nan-inf inputs, otherwise default.

Examples:

  • A plain number passes through as a float:
    >>> from pyramids.stac._extensions import parse_number
    >>> parse_number(-9999)
    -9999.0
    
  • The string "-inf" becomes negative infinity:
    >>> parse_number("-inf")
    -inf
    
  • An unparseable value falls back to the default:
    >>> parse_number("n/a", default=0.0)
    0.0
    
Source code in src/pyramids/stac/_extensions.py
def parse_number(value: Any, default: Any = None) -> Any:
    """Coerce a STAC numeric field to a float, honouring nan/inf strings.

    The ``raster`` extension allows non-finite nodata values to be encoded as
    the strings ``"nan"`` / ``"inf"`` / ``"-inf"``.

    Args:
        value: The raw field value (number, numeric string, nan/inf string,
            or ``None``).
        default: Returned when `value` is ``None`` or cannot be parsed.

    Returns:
        A float for numeric / nan-inf inputs, otherwise `default`.

    Examples:
        - A plain number passes through as a float:
            ```python
            >>> from pyramids.stac._extensions import parse_number
            >>> parse_number(-9999)
            -9999.0

            ```
        - The string ``"-inf"`` becomes negative infinity:
            ```python
            >>> parse_number("-inf")
            -inf

            ```
        - An unparseable value falls back to the default:
            ```python
            >>> parse_number("n/a", default=0.0)
            0.0

            ```
    """
    if value is None:
        return default
    if isinstance(value, bool):
        return default
    if isinstance(value, (int, float)):
        return float(value)
    if isinstance(value, str):
        token = value.strip().lower()
        if token in _NODATA_STRINGS:
            return _NODATA_STRINGS[token]
        try:
            return float(value)
        except ValueError:
            return default
    return default

affine_to_geotransform(transform) #

Convert a STAC proj:transform affine to a GDAL geotransform.

proj:transform is the affine ordering [a, b, c, d, e, f] (mapping (col, row) to (x, y): x = a*col + b*row + c, y = d*col + e*row + f). GDAL's geotransform is the reordering (c, a, b, f, d, e) — i.e. (x_origin, x_res, x_rot, y_origin, y_rot, y_res). A 9-element affine (with a trailing [0, 0, 1] row) is accepted; only the first six coefficients are used.

Parameters:

Name Type Description Default
transform Any

A 6- or 9-element proj:transform sequence.

required

Returns:

Type Description
tuple[float, ...]

The 6-tuple GDAL geotransform.

Raises:

Type Description
ValueError

When transform has fewer than six coefficients.

Examples:

  • A north-up 30 m grid reorders to the GDAL geotransform:
    >>> from pyramids.stac._extensions import affine_to_geotransform
    >>> affine_to_geotransform([30.0, 0.0, 224985.0, 0.0, -30.0, 6790215.0])
    (224985.0, 30.0, 0.0, 6790215.0, 0.0, -30.0)
    
  • The trailing [0, 0, 1] row of a 9-element affine is ignored:
    >>> affine_to_geotransform([10.0, 0.0, 100.0, 0.0, -10.0, 200.0, 0.0, 0.0, 1.0])
    (100.0, 10.0, 0.0, 200.0, 0.0, -10.0)
    
Source code in src/pyramids/stac/_extensions.py
def affine_to_geotransform(transform: Any) -> tuple[float, ...]:
    """Convert a STAC ``proj:transform`` affine to a GDAL geotransform.

    ``proj:transform`` is the affine ordering ``[a, b, c, d, e, f]``
    (mapping ``(col, row)`` to ``(x, y)``: ``x = a*col + b*row + c``,
    ``y = d*col + e*row + f``). GDAL's geotransform is the reordering
    ``(c, a, b, f, d, e)`` — i.e. ``(x_origin, x_res, x_rot, y_origin, y_rot,
    y_res)``. A 9-element affine (with a trailing ``[0, 0, 1]`` row) is
    accepted; only the first six coefficients are used.

    Args:
        transform: A 6- or 9-element ``proj:transform`` sequence.

    Returns:
        The 6-tuple GDAL geotransform.

    Raises:
        ValueError: When `transform` has fewer than six coefficients.

    Examples:
        - A north-up 30 m grid reorders to the GDAL geotransform:
            ```python
            >>> from pyramids.stac._extensions import affine_to_geotransform
            >>> affine_to_geotransform([30.0, 0.0, 224985.0, 0.0, -30.0, 6790215.0])
            (224985.0, 30.0, 0.0, 6790215.0, 0.0, -30.0)

            ```
        - The trailing ``[0, 0, 1]`` row of a 9-element affine is ignored:
            ```python
            >>> affine_to_geotransform([10.0, 0.0, 100.0, 0.0, -10.0, 200.0, 0.0, 0.0, 1.0])
            (100.0, 10.0, 0.0, 200.0, 0.0, -10.0)

            ```
    """
    coeffs = list(transform)
    if len(coeffs) < 6:
        raise ValueError(
            f"proj:transform must have at least 6 coefficients, got {len(coeffs)}: "
            f"{coeffs!r}"
        )
    a, b, c, d, e, f = (float(x) for x in coeffs[:6])
    return (c, a, b, f, d, e)

geotransform_to_affine(geotransform) #

Convert a GDAL geotransform to a STAC proj:transform affine.

The inverse of :func:affine_to_geotransform. GDAL's geotransform is (c, a, b, f, d, e)(x_origin, x_res, x_rot, y_origin, y_rot, y_res) — and proj:transform is the affine ordering [a, b, c, d, e, f].

Parameters:

Name Type Description Default
geotransform Any

A 6-element GDAL geotransform.

required

Returns:

Type Description
list[float]

The 6-element proj:transform affine.

Raises:

Type Description
ValueError

When geotransform has fewer than six coefficients.

Examples:

  • A north-up 30 m grid maps back to the affine order:
    >>> from pyramids.stac._extensions import geotransform_to_affine
    >>> geotransform_to_affine((224985.0, 30.0, 0.0, 6790215.0, 0.0, -30.0))
    [30.0, 0.0, 224985.0, 0.0, -30.0, 6790215.0]
    
Source code in src/pyramids/stac/_extensions.py
def geotransform_to_affine(geotransform: Any) -> list[float]:
    """Convert a GDAL geotransform to a STAC ``proj:transform`` affine.

    The inverse of :func:`affine_to_geotransform`. GDAL's geotransform is
    ``(c, a, b, f, d, e)`` — ``(x_origin, x_res, x_rot, y_origin, y_rot,
    y_res)`` — and ``proj:transform`` is the affine ordering
    ``[a, b, c, d, e, f]``.

    Args:
        geotransform: A 6-element GDAL geotransform.

    Returns:
        The 6-element ``proj:transform`` affine.

    Raises:
        ValueError: When `geotransform` has fewer than six coefficients.

    Examples:
        - A north-up 30 m grid maps back to the affine order:
            ```python
            >>> from pyramids.stac._extensions import geotransform_to_affine
            >>> geotransform_to_affine((224985.0, 30.0, 0.0, 6790215.0, 0.0, -30.0))
            [30.0, 0.0, 224985.0, 0.0, -30.0, 6790215.0]

            ```
    """
    gt = list(geotransform)
    if len(gt) < 6:
        raise ValueError(
            f"geotransform must have at least 6 coefficients, got {len(gt)}: {gt!r}"
        )
    c, a, b, f, d, e = (float(x) for x in gt[:6])
    return [a, b, c, d, e, f]

read_extension_metadata(item, asset_key=None) #

Read proj / raster / eo extension fields for a STAC asset.

Item-level fields (under properties) are read first and an asset-level value of the same key overrides them, matching the STAC convention that an asset narrows item-level metadata. No asset file is opened.

Parameters:

Name Type Description Default
item Any

A STAC Item (pystac object or raw dict). When asset_key is None the item is treated as the asset itself.

required
asset_key str | None

The asset key whose metadata to read, or None to read a bare asset.

None

Returns:

Type Description
dict[str, Any]

A dict with keys:

dict[str, Any]
  • epsgproj:epsg (int) or None.
dict[str, Any]
  • crsproj:code (e.g. "EPSG:32633") when present, else derived from proj:epsg, else None.
dict[str, Any]
  • transform — raw proj:transform list or None.
dict[str, Any]
  • geotransform — the GDAL geotransform derived from transform, or None when no transform is present.
dict[str, Any]
  • shapeproj:shape [rows, cols] or None.
dict[str, Any]
  • raster_bandsraster:bands list or None.
dict[str, Any]
  • eo_bandseo:bands list or None.
dict[str, Any]
  • band_names — names derived from eo:bands (name or common_name) when every band has one, else None.

Raises:

Type Description
StacAssetError

When asset_key is given but absent from the item.

Examples:

  • Read a Sentinel-2-style asset's projection metadata from raw JSON:
    >>> from pyramids.stac._extensions import read_extension_metadata
    >>> item = {
    ...     "properties": {"proj:epsg": 32633},
    ...     "assets": {"B04": {
    ...         "href": "s3://b/B04.tif",
    ...         "proj:shape": [10980, 10980],
    ...         "proj:transform": [10.0, 0.0, 600000.0, 0.0, -10.0, 5300040.0],
    ...         "raster:bands": [{"nodata": 0, "scale": 0.0001}],
    ...         "eo:bands": [{"name": "B04", "common_name": "red"}],
    ...     }},
    ... }
    >>> meta = read_extension_metadata(item, "B04")
    >>> meta["crs"]
    'EPSG:32633'
    >>> meta["geotransform"]
    (600000.0, 10.0, 0.0, 5300040.0, 0.0, -10.0)
    >>> meta["band_names"]
    ['B04']
    
  • An asset-level proj:epsg overrides the item-level value:
    >>> item = {
    ...     "properties": {"proj:epsg": 4326},
    ...     "assets": {"dem": {"href": "x.tif", "proj:epsg": 3857}},
    ... }
    >>> read_extension_metadata(item, "dem")["epsg"]
    3857
    
  • A bare asset with no extension fields yields all-empty metadata:
    >>> meta = read_extension_metadata({"href": "x.tif"})
    >>> (meta["crs"], meta["geotransform"], meta["raster_bands"])
    (None, None, None)
    
Source code in src/pyramids/stac/_extensions.py
def read_extension_metadata(item: Any, asset_key: str | None = None) -> dict[str, Any]:
    """Read ``proj`` / ``raster`` / ``eo`` extension fields for a STAC asset.

    Item-level fields (under ``properties``) are read first and an asset-level
    value of the same key overrides them, matching the STAC convention that an
    asset narrows item-level metadata. No asset file is opened.

    Args:
        item: A STAC Item (pystac object or raw dict). When `asset_key` is
            ``None`` the `item` is treated as the asset itself.
        asset_key: The asset key whose metadata to read, or ``None`` to read
            a bare asset.

    Returns:
        A dict with keys:

        * ``epsg`` — ``proj:epsg`` (int) or ``None``.
        * ``crs`` — ``proj:code`` (e.g. ``"EPSG:32633"``) when present, else
          derived from ``proj:epsg``, else ``None``.
        * ``transform`` — raw ``proj:transform`` list or ``None``.
        * ``geotransform`` — the GDAL geotransform derived from ``transform``,
          or ``None`` when no transform is present.
        * ``shape`` — ``proj:shape`` ``[rows, cols]`` or ``None``.
        * ``raster_bands`` — ``raster:bands`` list or ``None``.
        * ``eo_bands`` — ``eo:bands`` list or ``None``.
        * ``band_names`` — names derived from ``eo:bands`` (``name`` or
          ``common_name``) when every band has one, else ``None``.

    Raises:
        StacAssetError: When `asset_key` is given but absent from the item.

    Examples:
        - Read a Sentinel-2-style asset's projection metadata from raw JSON:
            ```python
            >>> from pyramids.stac._extensions import read_extension_metadata
            >>> item = {
            ...     "properties": {"proj:epsg": 32633},
            ...     "assets": {"B04": {
            ...         "href": "s3://b/B04.tif",
            ...         "proj:shape": [10980, 10980],
            ...         "proj:transform": [10.0, 0.0, 600000.0, 0.0, -10.0, 5300040.0],
            ...         "raster:bands": [{"nodata": 0, "scale": 0.0001}],
            ...         "eo:bands": [{"name": "B04", "common_name": "red"}],
            ...     }},
            ... }
            >>> meta = read_extension_metadata(item, "B04")
            >>> meta["crs"]
            'EPSG:32633'
            >>> meta["geotransform"]
            (600000.0, 10.0, 0.0, 5300040.0, 0.0, -10.0)
            >>> meta["band_names"]
            ['B04']

            ```
        - An asset-level ``proj:epsg`` overrides the item-level value:
            ```python
            >>> item = {
            ...     "properties": {"proj:epsg": 4326},
            ...     "assets": {"dem": {"href": "x.tif", "proj:epsg": 3857}},
            ... }
            >>> read_extension_metadata(item, "dem")["epsg"]
            3857

            ```
        - A bare asset with no extension fields yields all-empty metadata:
            ```python
            >>> meta = read_extension_metadata({"href": "x.tif"})
            >>> (meta["crs"], meta["geotransform"], meta["raster_bands"])
            (None, None, None)

            ```
    """
    props = item_properties(item)
    asset = get_asset(item, asset_key) if asset_key is not None else item

    def pick(key: str, default: Any = None) -> Any:
        return asset_field(asset, key, props.get(key, default))

    epsg = pick("proj:epsg")
    code = pick("proj:code")
    if code is None and epsg is not None:
        code = f"EPSG:{epsg}"

    transform = pick("proj:transform")
    geotransform = affine_to_geotransform(transform) if transform else None

    eo_bands = asset_field(asset, "eo:bands")
    band_names: list[str] | None = None
    if eo_bands:
        names = [b.get("name") or b.get("common_name") for b in eo_bands]
        if all(names):
            band_names = list(names)

    return {
        "epsg": epsg,
        "crs": code,
        "transform": transform,
        "geotransform": geotransform,
        "shape": pick("proj:shape"),
        "raster_bands": asset_field(asset, "raster:bands"),
        "eo_bands": eo_bands,
        "band_names": band_names,
    }

VRT mosaic#

pyramids.stac._vrt.build_vrt_from_stac(items, asset, *, signer=None, separate=False) #

Mosaic one STAC asset across items into a lazy VRT-backed Dataset.

Parameters:

Name Type Description Default
items Any

Iterable of STAC Items (pystac objects, raw JSON dicts, or any duck-typed equivalent — same contract as :meth:pyramids.dataset.DatasetCollection.from_stac).

required
asset str

The asset key to mosaic (e.g. "visual", "B04").

required
signer Any

Optional signer (e.g. a :class:pyramids.stac.signers.Signer). Its sign_href rewrites every source href and its gdal_env() is installed while the VRT is built. See the module note on read-time credentials for env-based signers.

None
separate bool

When False (default) the assets are mosaicked spatially (overlapping/tiling sources compose into one image — the stac-vrt model). When True, each source becomes a separate band (a band-stack VRT), which requires the sources to share a grid.

False

Returns:

Name Type Description
Dataset Dataset

A lazy Dataset over an in-memory .vrt; GDAL reads the

Dataset

underlying sources on demand (/vsicurl/ range requests for remote

Dataset

hrefs).

Raises:

Type Description
ValueError

When items yields no items.

RuntimeError

When gdal.BuildVRT fails (e.g. sources with inconsistent band counts, or unreadable paths).

Examples:

  • Mosaic the visual asset of several items into one lazy Dataset (requires network for remote hrefs):
    >>> from pyramids.stac import build_vrt_from_stac  # doctest: +SKIP
    >>> ds = build_vrt_from_stac(items, asset="visual")  # doctest: +SKIP
    >>> arr = ds.read_array()  # GDAL pulls source pixels lazily  # doctest: +SKIP
    
Source code in src/pyramids/stac/_vrt.py
def build_vrt_from_stac(
    items: Any,
    asset: str,
    *,
    signer: Any = None,
    separate: bool = False,
) -> Dataset:
    """Mosaic one STAC asset across items into a lazy VRT-backed `Dataset`.

    Args:
        items: Iterable of STAC Items (pystac objects, raw JSON dicts, or any
            duck-typed equivalent — same contract as
            :meth:`pyramids.dataset.DatasetCollection.from_stac`).
        asset: The asset key to mosaic (e.g. `"visual"`, `"B04"`).
        signer: Optional signer (e.g. a :class:`pyramids.stac.signers.Signer`).
            Its `sign_href` rewrites every source href and its `gdal_env()` is
            installed while the VRT is built. See the module note on read-time
            credentials for env-based signers.
        separate: When `False` (default) the assets are mosaicked spatially
            (overlapping/tiling sources compose into one image — the stac-vrt
            model). When `True`, each source becomes a separate band (a
            band-stack VRT), which requires the sources to share a grid.

    Returns:
        Dataset: A lazy `Dataset` over an in-memory `.vrt`; GDAL reads the
        underlying sources on demand (`/vsicurl/` range requests for remote
        hrefs).

    Raises:
        ValueError: When `items` yields no items.
        RuntimeError: When `gdal.BuildVRT` fails (e.g. sources with
            inconsistent band counts, or unreadable paths).

    Examples:
        - Mosaic the `visual` asset of several items into one lazy Dataset
          (requires network for remote hrefs):
            ```python
            >>> from pyramids.stac import build_vrt_from_stac  # doctest: +SKIP
            >>> ds = build_vrt_from_stac(items, asset="visual")  # doctest: +SKIP
            >>> arr = ds.read_array()  # GDAL pulls source pixels lazily  # doctest: +SKIP

            ```
    """
    item_list = list(items)
    if not item_list:
        raise ValueError("build_vrt_from_stac received no items.")

    gdal_env = signer.gdal_env() if signer is not None else None
    vsi_paths = [
        _to_vsi(resolved_href(item, asset, signer=signer)) for item in item_list
    ]
    vrt_path = f"/vsimem/pyramids_stac_{uuid.uuid4().hex}.vrt"

    with cloud_config_from_env(gdal_env):
        vrt_ds = gdal.BuildVRT(
            vrt_path, vsi_paths, options=gdal.BuildVRTOptions(separate=separate)
        )
        if vrt_ds is None:
            raise RuntimeError(
                f"gdal.BuildVRT returned None for asset {asset!r} over "
                f"{len(vsi_paths)} item(s); check that every source is a "
                "readable raster with a consistent band count and CRS."
            )
        vrt_ds.FlushCache()
        vrt_ds = None
        dataset = Dataset.read_file(vrt_path)
    # Track the in-memory VRT so it is unlinked at process exit (M1).
    register_vsimem(vrt_path)
    return dataset

Download to local files#

pyramids.stac.download.download_item(item, directory, *, include=None, exclude=None, s3_requester_pays=False) #

Download a STAC Item's assets to a local directory.

A thin, synchronous wrapper over stac_asset.blocking.download_item (the async download_item cannot run inside a live event loop). The per-protocol client is chosen by stac_asset from each asset href.

Parameters:

Name Type Description Default
item Any

A pystac.Item (stac-asset operates on pystac objects).

required
directory str | Path

Destination directory for the downloaded assets.

required
include list[str] | None

Optional asset keys to include (others skipped).

None
exclude list[str] | None

Optional asset keys to exclude.

None
s3_requester_pays bool

Opt into Requester-Pays for s3:// assets.

False

Returns:

Type Description
Any

The downloaded pystac.Item (with asset hrefs rewritten to the local

Any

paths), as returned by stac_asset.

Raises:

Type Description
OptionalPackageDoesNotExist

When stac-asset is not installed.

Examples:

  • Download an item's assets, then build a collection from the locals (requires the [stac] extra + network):
    >>> from pyramids.stac import download_item  # doctest: +SKIP
    >>> local = download_item(item, "scenes/")  # doctest: +SKIP
    >>> hrefs = [a.href for a in local.assets.values()]  # doctest: +SKIP
    
Source code in src/pyramids/stac/download.py
def download_item(
    item: Any,
    directory: str | Path,
    *,
    include: list[str] | None = None,
    exclude: list[str] | None = None,
    s3_requester_pays: bool = False,
) -> Any:
    """Download a STAC Item's assets to a local directory.

    A thin, synchronous wrapper over ``stac_asset.blocking.download_item`` (the
    async `download_item` cannot run inside a live event loop). The per-protocol
    client is chosen by `stac_asset` from each asset href.

    Args:
        item: A `pystac.Item` (stac-asset operates on pystac objects).
        directory: Destination directory for the downloaded assets.
        include: Optional asset keys to include (others skipped).
        exclude: Optional asset keys to exclude.
        s3_requester_pays: Opt into Requester-Pays for `s3://` assets.

    Returns:
        The downloaded `pystac.Item` (with asset hrefs rewritten to the local
        paths), as returned by `stac_asset`.

    Raises:
        OptionalPackageDoesNotExist: When `stac-asset` is not installed.

    Examples:
        - Download an item's assets, then build a collection from the locals
          (requires the `[stac]` extra + network):
            ```python
            >>> from pyramids.stac import download_item  # doctest: +SKIP
            >>> local = download_item(item, "scenes/")  # doctest: +SKIP
            >>> hrefs = [a.href for a in local.assets.values()]  # doctest: +SKIP

            ```
    """
    import_stac_asset(_STAC_ASSET_INSTALL_HINT)
    import stac_asset.blocking
    from stac_asset import Config

    config = Config(
        include=list(include) if include else [],
        exclude=list(exclude) if exclude else [],
        s3_requester_pays=s3_requester_pays,
    )
    return stac_asset.blocking.download_item(item, str(directory), config=config)

GeoParquet round-trip#

pyramids.stac._geoparquet #

Serialize STAC Items to/from GeoParquet (PD-3).

stac-geoparquet stores a STAC ItemCollection as one columnar GeoParquet file (geometry as WKB, WGS84) for bulk transfer + fast spatial filtering, avoiding thousands of per-item JSON requests. pyramids already has geopandas (core) and a FeatureCollection (a GeoDataFrame subclass) with GeoParquet I/O, plus the [parquet] extra (pyarrow) — so the round-trip needs no new dependency.

This is a lossless pyramids variant: each row carries the item geometry (so the file is a valid, spatially-filterable GeoParquet) plus the full STAC Item as a JSON column, so :func:from_geoparquet reconstructs the exact item dicts — ready to feed :meth:pyramids.dataset.DatasetCollection.from_stac.

Requires the [parquet] extra (pyarrow) for the Parquet read/write itself.

to_geoparquet(items, path) #

Write a sequence of STAC Items to a GeoParquet file.

Each item becomes a row carrying its geometry (a valid, spatially-filterable GeoParquet geometry in EPSG:4326) and the full item as a JSON column.

Parameters:

Name Type Description Default
items Any

Iterable of STAC Items (pystac.Item objects with to_dict(), or raw STAC-JSON dicts — e.g. from :meth:pyramids.dataset.Dataset.to_stac_item).

required
path str | Path

Destination .parquet path.

required

Raises:

Type Description
ValueError

When items is empty.

OptionalPackageDoesNotExist

When pyarrow (the [parquet] extra) is not installed (raised by FeatureCollection.to_parquet).

Examples:

  • Round-trip a couple of item dicts through GeoParquet:
    >>> import tempfile, os
    >>> from pyramids.stac import to_geoparquet, from_geoparquet  # doctest: +SKIP
    >>> items = [{"id": "a", "geometry": {"type": "Point", "coordinates": [1.0, 2.0]},
    ...           "properties": {"datetime": "2023-01-01T00:00:00Z"}, "assets": {}}]
    >>> path = os.path.join(tempfile.mkdtemp(), "items.parquet")  # doctest: +SKIP
    >>> to_geoparquet(items, path)  # doctest: +SKIP
    >>> from_geoparquet(path)[0]["id"]  # doctest: +SKIP
    'a'
    
Source code in src/pyramids/stac/_geoparquet.py
def to_geoparquet(items: Any, path: str | Path) -> None:
    """Write a sequence of STAC Items to a GeoParquet file.

    Each item becomes a row carrying its geometry (a valid, spatially-filterable
    GeoParquet geometry in EPSG:4326) and the full item as a JSON column.

    Args:
        items: Iterable of STAC Items (`pystac.Item` objects with `to_dict()`,
            or raw STAC-JSON dicts — e.g. from
            :meth:`pyramids.dataset.Dataset.to_stac_item`).
        path: Destination `.parquet` path.

    Raises:
        ValueError: When `items` is empty.
        OptionalPackageDoesNotExist: When pyarrow (the `[parquet]` extra) is not
            installed (raised by `FeatureCollection.to_parquet`).

    Examples:
        - Round-trip a couple of item dicts through GeoParquet:
            ```python
            >>> import tempfile, os
            >>> from pyramids.stac import to_geoparquet, from_geoparquet  # doctest: +SKIP
            >>> items = [{"id": "a", "geometry": {"type": "Point", "coordinates": [1.0, 2.0]},
            ...           "properties": {"datetime": "2023-01-01T00:00:00Z"}, "assets": {}}]
            >>> path = os.path.join(tempfile.mkdtemp(), "items.parquet")  # doctest: +SKIP
            >>> to_geoparquet(items, path)  # doctest: +SKIP
            >>> from_geoparquet(path)[0]["id"]  # doctest: +SKIP
            'a'

            ```
    """
    from pyramids.feature import FeatureCollection

    rows = []
    geometries = []
    for item in items:
        as_dict = _item_to_dict(item)
        geometries.append(_item_geometry(as_dict))
        rows.append({"id": as_dict.get("id"), _ITEM_COLUMN: json.dumps(as_dict)})

    if not rows:
        raise ValueError("to_geoparquet received no items.")

    fc = FeatureCollection(rows, geometry=geometries, crs="EPSG:4326")
    fc.to_parquet(str(path))

from_geoparquet(path) #

Read STAC Items back from a GeoParquet written by :func:to_geoparquet.

Parameters:

Name Type Description Default
path str | Path

Path to a .parquet file produced by :func:to_geoparquet.

required

Returns:

Type Description
list[dict[str, Any]]

The list of STAC Item dicts (ready for

list[dict[str, Any]]

meth:pyramids.dataset.DatasetCollection.from_stac).

Raises:

Type Description
OptionalPackageDoesNotExist

When pyarrow (the [parquet] extra) is not installed (raised by FeatureCollection.read_parquet).

KeyError

When the file lacks the stac_item JSON column (not written by :func:to_geoparquet).

Source code in src/pyramids/stac/_geoparquet.py
def from_geoparquet(path: str | Path) -> list[dict[str, Any]]:
    """Read STAC Items back from a GeoParquet written by :func:`to_geoparquet`.

    Args:
        path: Path to a `.parquet` file produced by :func:`to_geoparquet`.

    Returns:
        The list of STAC Item dicts (ready for
        :meth:`pyramids.dataset.DatasetCollection.from_stac`).

    Raises:
        OptionalPackageDoesNotExist: When pyarrow (the `[parquet]` extra) is not
            installed (raised by `FeatureCollection.read_parquet`).
        KeyError: When the file lacks the `stac_item` JSON column (not written
            by :func:`to_geoparquet`).
    """
    from pyramids.feature import FeatureCollection

    fc = FeatureCollection.read_parquet(str(path))
    return [json.loads(blob) for blob in fc[_ITEM_COLUMN]]