Feature Subpackage #

The pyramids.feature subpackage is the vector-data counterpart of pyramids.dataset. It ships a single user-facing class, FeatureCollection, plus two helper modules with pure functions for geometry manipulation and CRS handling.

Module Layout #

Hold "Ctrl" to enable pan & zoom

classDiagram
    class GeoDataFrame {
        <<geopandas>>
    }

    class FeatureCollection {
        +from_features(data, crs)
        +from_records(records, orient, geometry, crs)
        +iter_features(path, layer, bbox, chunksize, tile_strategy, as_dict)
        +read_file(path, layer, bbox, columns, where)
        +read_parquet(path, columns, bbox)
        +to_parquet(path, compression)
        +to_file(path, driver, layer, mode, creation_options)
        +list_layers(path)
        +list_layers_cache_clear()
        +schema
        +epsg
        +top_left_corner
        +column
        +with_coordinates()
        +with_centroid()
        +concat(other)
        +plot(column, basemap, **kwargs)
        +create_polygon(coords)
        +polygon_wkt(coords)
        +create_points(coords)
        +point_collection(coords, crs)
        +get_epsg_from_prj(prj)
        +reproject_coordinates(x, y, from_crs, to_crs, precision)
        +__enter__()
        +__exit__(exc_type, exc, tb)
        +close()
    }

    class geometry {
        <<module>>
        +create_polygon(coords)
        +polygon_wkt(coords)
        +create_points(coords)
        +point_collection(coords, crs)
        +get_coords(row, geom_col, coord_type)
        +get_xy_coords(geometry, coord_type)
        +get_point_coords(geometry, coord_type)
        +get_line_coords(geometry, coord_type)
        +get_poly_coords(geometry, coord_type)
        +explode_gdf(gdf, geometry)
        +multi_geom_handler(multi_geometry, coord_type, geom_type)
        +geometry_collection_coords(geom, coord_type)
    }

    class crs {
        <<module>>
        +create_sr_from_proj(prj, string_type)
        +get_epsg_from_prj(prj)
        +reproject_coordinates(x, y, from_crs, to_crs, precision)
    }

    class _ogr {
        <<private>>
        +gdf_to_datasource(gdf)
        +datasource_to_gdf(ds)
    }

    GeoDataFrame <|-- FeatureCollection
    FeatureCollection ..> geometry : delegates
    FeatureCollection ..> crs : delegates
    FeatureCollection ..> _ogr : "OGR bridge\n(internal)"

FeatureCollection — the public class, a direct subclass of geopandas.GeoDataFrame.
geometry — shape factories and coordinate-extraction helpers.
crs — CRS / EPSG / reprojection helpers.
_ogr — private OGR bridge (OGR DataSource never leaves the subpackage).

When to reach for which #

Task	Entry point
Read a vector file (Shapefile / GeoJSON / GPKG / Parquet / zipped / cloud)	`FeatureCollection.read_file` / `read_parquet`
Stream a large file in chunks	`FeatureCollection.iter_features`
Build from Python data (records or columnar dict)	`FeatureCollection.from_records`
Wrap an existing `GeoDataFrame`	`FeatureCollection(gdf)` or `FeatureCollection.from_features(gdf)`
Inspect layers / schema without reading	`FeatureCollection.list_layers`, `.schema`
Attach per-vertex or centroid columns	`.with_coordinates()`, `.with_centroid()`
Concatenate two FCs safely (CRS-checked)	`.concat(other)`
Build raw geometries	`pyramids.feature.geometry.create_polygon` / `create_points`
Reproject coordinate arrays	`pyramids.base.crs.reproject_coordinates`

Lazy / Dask reads #

For files too large to load eagerly — multi-GB GeoParquet, cloud-hosted vector tables, planet-scale datasets like Overture Maps — pyramids offers a dask-backed path:

from pyramids.feature import FeatureCollection

lfc = FeatureCollection.read_parquet(
    "s3://overturemaps-us-west-2/release/2024-07-22.0/theme=places/type=place",
    backend="dask",
    columns=["id", "names", "geometry"],
    bbox=(2.0, 48.8, 2.5, 49.0),
)
lfc.spatial_shuffle().sjoin(zones).compute()

The backend="dask" branch returns a LazyFeatureCollection (a subclass of dask_geopandas.GeoDataFrame) whose partition-aware ops (to_crs, clip, sjoin, spatial_shuffle) run lazily.

See Lazy vector reads for the full guide: spatial_shuffle → sjoin pruning workflow, compute vs persist, to_parquet, compute_total_bounds, and how to wire a distributed scheduler with pyramids.configure_lazy_vector.

Install: pip install 'pyramids-gis[parquet-lazy]'.

Build a one-row FC from a bbox — `from_bbox`#

FeatureCollection.from_bbox((W, S, E, N), epsg=…) is the shared primitive behind Dataset.crop(bbox=…), Dataset.read_array(bbox=…), and DatasetCollection.crop(bbox=…). It returns a single-row FC whose only geometry is the rectangular polygon — convenient when you want to hand the same mask to multiple downstream operations, or when you need the polygon for some other geopandas / shapely call.

from pyramids.feature import FeatureCollection

mask = FeatureCollection.from_bbox((6.8, 50.3, 7.2, 50.6), epsg=4326)
mask.to_file("aoi.geojson")

epsg is required (a bbox without a CRS is ambiguous); the bbox must satisfy west < east and south < north.

FeatureCollection Class #

`pyramids.feature.FeatureCollection` #

Bases: GeoDataFrame

A :class:geopandas.GeoDataFrame with pyramids-specific GIS methods.

FeatureCollection is a GeoDataFrame — isinstance(fc, GeoDataFrame)` is `True — so every geopandas method is available directly. Pyramids adds rasterization, Dataset interop, vertex extraction, and CRS helpers on top.

The OGR/GDAL backend is internal only; see :mod:pyramids.feature._ogr.

Source code in src/pyramids/feature/collection.py

class FeatureCollection(GeoDataFrame):
    """A :class:`geopandas.GeoDataFrame` with pyramids-specific GIS methods.

    `FeatureCollection` *is a* `GeoDataFrame` — ``isinstance(fc,
    GeoDataFrame)` is `True`` — so every geopandas method is
    available directly. Pyramids adds rasterization, Dataset interop,
    vertex extraction, and CRS helpers on top.

    The OGR/GDAL backend is internal only; see
    :mod:`pyramids.feature._ogr`.
    """

    @property
    def _constructor(self):
        """Return the type pandas uses when constructing new frames."""
        return FeatureCollection

    # merge with GeoDataFrame._metadata instead of replacing it.
    # The parent class lists `_geometry_column_name` (the name of the
    # active geometry column); overriding _metadata with just our own
    # entries drops that attribute on pickle / copy / concat, and the
    # restored object can no longer find its geometry column. Always
    # splat the parent's list first.
    # dedupe via `dict.fromkeys` so that if a future geopandas
    # release adds one of our own names to its own `_metadata` list,
    # the pyramids subclass does not carry a duplicate entry. Python
    # preserves insertion order in dicts since 3.7, so the parent's
    # ordering is preserved.
    _metadata: list[str] = list(
        dict.fromkeys(
            [
                *GeoDataFrame._metadata,
                "_epsg_cache_crs",
                "_epsg_cache_value",
            ]
        )
    )
    """Instance attributes pandas must preserve across copy/slice/pickle.

    Holds:

    * `GeoDataFrame._metadata` (currently `_geometry_column_name`)
      — required for pickle round-trips to remember which column is
      the active geometry column.
    * `_epsg_cache_crs` / `_epsg_cache_value` — the EPSG
      cache.

    The list is wrapped in `list(dict.fromkeys(...))` so that a
    future geopandas release adding one of our own names to its own
    `_metadata` list does not produce a duplicate entry. `dict`
    preserves insertion order since Python 3.7, so the parent's
    ordering is preserved.
    """

    def __init__(self, data: Any = None, *args: Any, **kwargs: Any) -> None:
        """Construct a FeatureCollection.

        Accepts anything :class:`geopandas.GeoDataFrame` accepts.
        Rejects `ogr.DataSource` / `gdal.Dataset` with a clear error
        .
        """
        if isinstance(data, (ogr.DataSource, gdal.Dataset)):
            raise TypeError(
                "FeatureCollection no longer accepts ogr.DataSource or "
                "gdal.Dataset objects. OGR is an internal implementation "
                "detail. Use FeatureCollection.read_file(path) to load a "
                "file, or pass a GeoDataFrame."
            )
        super().__init__(data, *args, **kwargs)

    def __enter__(self) -> FeatureCollection:
        """Enter a context-managed block. Returns `self`.

        Returns:
            FeatureCollection: `self` — the exact same instance, so
            `with... as fc:` binds `fc` to this collection.

        Examples:
            - Use as a context manager and access rows inside the block:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> gdf = gpd.GeoDataFrame(
                ...     {"id": [1, 2]},
                ...     geometry=[Point(0, 0), Point(1, 1)],
                ...     crs="EPSG:4326",
                ... )
                >>> with FeatureCollection(gdf) as fc:
                ...     n = len(fc)
                >>> n
                2

                ```
            - Exceptions raised inside the block still propagate:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
                ...     )
                ... )
                >>> try:
                ...     with fc:
                ...         raise RuntimeError("boom")
                ... except RuntimeError as err:
                ...     print(err)
                boom

                ```
        """
        return self

    def __exit__(self, exc_type, exc, tb) -> bool:
        """Exit the context-managed block. Calls :meth:`close`.

        Args:
            exc_type: Exception class if the block raised, else `None`.
            exc: Exception instance if the block raised, else `None`.
            tb: Traceback for the raised exception, else `None`.

        Returns:
            bool: Always `False` — exceptions from inside the `with`
            block propagate to the caller rather than being swallowed.

        Examples:
            - The clean-exit path returns `False` so nothing is swallowed:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
                ...     )
                ... )
                >>> fc.__exit__(None, None, None)
                False

                ```
            - A `with` block that finishes normally just releases the FC:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> gdf = gpd.GeoDataFrame(
                ...     {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
                ... )
                >>> with FeatureCollection(gdf) as fc:
                ...     pass
                >>> len(fc)
                1

                ```
        """
        self.close()
        return False

    def close(self) -> None:
        """Release resources held by this FeatureCollection.

        No-op today (the OGR bridge is self-cleaning). Exists so future
        resource-holding features have an idiomatic release point.

        Returns:
            None: This method does not return a value.

        Examples:
            - `close()` is idempotent — calling it repeatedly is safe:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
                ...     )
                ... )
                >>> fc.close()
                >>> fc.close()
                >>> len(fc)
                1

                ```
            - The collection remains usable after `close` (no-op today):
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"v": [7]}, geometry=[Point(2, 3)], crs="EPSG:4326",
                ...     )
                ... )
                >>> fc.close()
                >>> fc.epsg
                4326

                ```
        """
        return None

    @classmethod
    def from_features(
        cls,
        features: Iterable[Any],
        *,
        crs: Any = None,
        columns: list[str] | None = None,
    ) -> FeatureCollection:
        """Build a FeatureCollection from feature-shaped inputs.

        Delegates to :meth:`geopandas.GeoDataFrame.from_features` and
        wraps the result. Accepts any of the shapes that method
        accepts:

        * a list (or iterator) of GeoJSON feature dicts of the form
          `{"type": "Feature", "geometry": {...}, "properties": {...}}`,
        * any object exposing `__geo_interface__` (shapely
          geometries, fiona records, custom feature classes), or
        * a bare `FeatureCollection` dict (`{"type":
          "FeatureCollection", "features": [...]}`).

        Args:
            features (Iterable):
                Feature dicts of the form
                `{"type": "Feature", "geometry": {...}, "properties": {...}}`,
                or any `__geo_interface__` provider. Also accepts a
                bare `FeatureCollection` dict.
            crs:
                CRS to attach to the result (EPSG int, `"EPSG:4326"`,
                WKT, Proj, or a :class:`pyproj.CRS`). `None` leaves
                the CRS unset.
            columns (list[str] | None):
                Explicit column order for the output. When `None`,
                geopandas infers columns from the first feature.

        Returns:
            FeatureCollection: A new FC backed by the supplied features.

        Raises:
            ValueError: If `features` is empty or exhausted before any
                feature is consumed. An empty GeoDataFrame from
                `from_features` has no `geometry` column, which
                breaks downstream pyramids methods that assume the
                column exists. Fail fast instead.

        Examples:
            - Build from a list of feature dicts:
                ```python
                >>> from pyramids.feature import FeatureCollection
                >>> feats = [
                ...     {"type": "Feature",
                ...      "geometry": {"type": "Point", "coordinates": [0, 0]},
                ...      "properties": {"name": "a"}},
                ...     {"type": "Feature",
                ...      "geometry": {"type": "Point", "coordinates": [1, 1]},
                ...      "properties": {"name": "b"}},
                ... ]
                >>> fc = FeatureCollection.from_features(feats, crs=4326)
                >>> len(fc)
                2
                >>> fc.epsg
                4326

                ```
        """
        # materialise an iterator so we can detect the empty case
        # before handing off to geopandas. `geopandas.from_features([])`
        # returns a GeoDataFrame with no `geometry` column, which
        # breaks every pyramids op that assumes the column exists.
        features_list = list(features)
        if not features_list:
            raise ValueError(
                "from_features requires at least one feature. An empty "
                "iterable would produce a GeoDataFrame with no geometry "
                "column, which breaks downstream pyramids methods."
            )
        gdf = gpd.GeoDataFrame.from_features(features_list, crs=crs, columns=columns)
        return cls(gdf)

    @classmethod
    def from_bbox(
        cls,
        bbox: tuple[float, float, float, float] | list[float],
        *,
        epsg: Any,
    ) -> FeatureCollection:
        """Build a one-row FeatureCollection from a geographic bounding box.

        The bbox is the canonical ``(west, south, east, north)`` quadruple in
        the CRS named by ``epsg``. The result is a single-row FC whose only
        geometry is a rectangular Polygon — handy for cropping a raster or
        windowed-reading it without writing out the polygon vertices by hand:

        .. code-block:: python

            mask = FeatureCollection.from_bbox((31.0, 30.0, 31.1, 30.1), epsg=4326)
            cropped = dataset.crop(mask)

        Most callers do not need to build this themselves — :meth:`Dataset.crop`
        and :meth:`Dataset.read_array` (via :meth:`pyramids.dataset.engines.io.IO.read_array`)
        accept the bbox/``epsg`` pair directly and call this helper internally.

        Args:
            bbox: A 4-element ``(west, south, east, north)`` tuple / list of
                numbers. Must satisfy ``west < east`` and ``south < north``.
            epsg: CRS for the bbox coordinates — anything ``geopandas`` accepts
                for ``crs=`` (EPSG int such as ``4326``, ``"EPSG:4326"`` string,
                WKT, Proj, or a :class:`pyproj.CRS`). Required (a bbox without
                a CRS is ambiguous).

        Returns:
            FeatureCollection: A one-row FC carrying the rectangular polygon,
            in the supplied CRS.

        Raises:
            ValueError: ``bbox`` is not a 4-element sequence, or violates
                ``west < east`` / ``south < north``, or ``epsg`` is ``None``.
            TypeError: ``bbox`` elements are not numbers.

        Examples:
            - Build a one-row FC from a bbox and inspect it:
                ```python
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection.from_bbox((31.0, 30.0, 31.1, 30.1), epsg=4326)
                >>> len(fc)
                1
                >>> tuple(float(v) for v in fc.total_bounds)
                (31.0, 30.0, 31.1, 30.1)
                >>> fc.crs.to_epsg()
                4326

                ```
            - Use it as a mask to crop a raster:
                ```python
                >>> import numpy as np
                >>> from pyramids.dataset import Dataset
                >>> from pyramids.feature import FeatureCollection
                >>> arr = np.arange(100, dtype="int16").reshape(10, 10)
                >>> ds = Dataset.create_from_array(
                ...     arr, top_left_corner=(0, 0), cell_size=0.05, epsg=4326,
                ... )
                >>> fc = FeatureCollection.from_bbox((0.1, -0.2, 0.2, -0.1), epsg=4326)
                >>> ds.crop(mask=fc).shape
                (1, 2, 2)

                ```
            - ``epsg=None`` is rejected — a bbox without a CRS is ambiguous:
                ```python
                >>> from pyramids.feature import FeatureCollection
                >>> try:
                ...     FeatureCollection.from_bbox((0, 0, 1, 1), epsg=None)
                ... except ValueError as exc:
                ...     print("epsg" in str(exc))
                True

                ```

        See Also:
            - :meth:`pyramids.dataset.engines.spatial.Spatial.crop`: accepts
              ``bbox=`` / ``epsg=`` directly and routes through this helper.
            - :meth:`pyramids.dataset.engines.io.IO.read_array`: same.
        """
        if epsg is None:
            raise ValueError(
                "from_bbox requires an explicit epsg= for the bbox CRS; "
                "a bbox without a CRS is ambiguous"
            )
        try:
            seq = list(bbox)
        except TypeError as exc:
            raise ValueError(
                f"bbox must be a 4-element (west, south, east, north) sequence; "
                f"got {bbox!r}"
            ) from exc
        if len(seq) != 4:
            raise ValueError(
                f"bbox must have exactly 4 elements (west, south, east, north); "
                f"got {len(seq)}: {seq!r}"
            )
        try:
            w, s, e, n = (float(v) for v in seq)
        except (TypeError, ValueError) as exc:
            raise TypeError(f"bbox elements must be numbers; got {seq!r}") from exc
        if not (w < e):
            raise ValueError(f"bbox must satisfy west < east; got west={w}, east={e}")
        if not (s < n):
            raise ValueError(
                f"bbox must satisfy south < north; got south={s}, north={n}"
            )
        return cls(geometry=[box(w, s, e, n)], crs=epsg)

    @classmethod
    def from_records(
        cls,
        records: Any,
        *,
        geometry: str = "geometry",
        crs: Any = None,
        orient: str = "records",
    ) -> FeatureCollection:
        """Build a FeatureCollection from dict records.

        Two input orientations are accepted (C26 added the second):

        * `orient="records"` (default) — an iterable of per-row dicts,
          each of the form `{column: value,..., geometry: <shapely>}`.
          The dict's keys become column names; the key named by
          `geometry` must hold a shapely geometry.
        * `orient="list"` — a single columnar dict mapping each
          column name to a list of values of equal length, for
          example `{"id": [1, 2], "geometry": [pt_a, pt_b]}`.

        Useful for ingesting rows from an API response that doesn't
        emit GeoJSON but already has shapely geoms.

        Args:
            records:
                Per-row iterable of dicts when `orient="records"`, or a
                single columnar dict when `orient="list"`.
            geometry (str):
                Name of the column / key holding the shapely geometry.
                Default `"geometry"`.
            crs:
                CRS to attach (same forms as :meth:`from_features`).
            orient (str):
                `"records"` (default) or `"list"` — matches the
                pandas `from_dict`/`from_records` conventions.

        Returns:
            FeatureCollection: A new FC with one row per record.

        Raises:
            FeatureError: If a record is missing the `geometry`
                column.
            ValueError: If `orient` is not one of the supported
                values.

        Examples:
            - Per-row records with the default geometry key:
                ```python
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> recs = [
                ...     {"id": 1, "geometry": Point(0, 0)},
                ...     {"id": 2, "geometry": Point(1, 1)},
                ... ]
                >>> fc = FeatureCollection.from_records(recs, crs=4326)
                >>> len(fc)
                2
                >>> fc.epsg
                4326

                ```
            - Custom geometry key via the `geometry=` kwarg:
                ```python
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> recs = [
                ...     {"id": 1, "geom": Point(0, 0)},
                ...     {"id": 2, "geom": Point(1, 1)},
                ... ]
                >>> fc = FeatureCollection.from_records(
                ...     recs, geometry="geom", crs=4326,
                ... )
                >>> fc.geometry.name
                'geom'

                ```
            - Columnar dict via `orient="list"`:
                ```python
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> cols = {"id": [1, 2], "geometry": [Point(0, 0), Point(1, 1)]}
                >>> fc = FeatureCollection.from_records(
                ...     cols, orient="list", crs=4326,
                ... )
                >>> list(fc["id"])
                [1, 2]

                ```
        """

        # empty-input branches both build a single-column frame
        # whose column name matches the `geometry=` kwarg, so
        # `GeoDataFrame(..., geometry=…)` sets it as the active
        # geometry column and the returned FC has
        # `geometry.name == geometry`.
        def _empty_fc() -> FeatureCollection:
            return cls(gpd.GeoDataFrame({geometry: []}, geometry=geometry, crs=crs))

        if orient == "records":
            records_list = list(records)
            if not records_list:
                return _empty_fc()
            df = pd.DataFrame.from_records(records_list)
        elif orient == "list":
            # columnar dict of equal-length lists. Straight into
            # `pd.DataFrame` which accepts this shape natively and
            # raises `ValueError` on mismatched lengths (propagated
            # to the caller as-is — the pandas message is already clear).
            if not isinstance(records, dict):
                raise ValueError(
                    f"orient='list' expects a dict of column → list; "
                    f"got {type(records).__name__}."
                )
            df = pd.DataFrame(records)
            if len(df) == 0:
                return _empty_fc()
        else:
            raise ValueError(f"orient must be 'records' or 'list'; got {orient!r}.")
        if geometry not in df.columns:
            raise FeatureError(
                f"records missing required geometry column {geometry!r}; "
                f"columns present: {list(df.columns)}"
            )
        return cls(gpd.GeoDataFrame(df, geometry=geometry, crs=crs))

    _VALID_TILE_STRATEGIES: tuple[str, ...] = (
        "auto",
        "rtree",
        "row_group",
        "none",
    )

    @classmethod
    def iter_features(
        cls,
        path: str | Path,
        *,
        layer: str | int | None = None,
        bbox: tuple[float, float, float, float] | None = None,
        where: str | None = None,
        chunksize: int | None = None,
        tile_strategy: str = "auto",
        include_index: bool = False,
    ) -> Any:
        """Stream features from `path` without materializing the full file.

        . Two orthogonal knobs:

        * **Chunk shape**. `chunksize=None` yields one GeoJSON-style
          dict per row (fiona idiom). `chunksize=N` yields
          :class:`FeatureCollection` batches of up to N rows each so
          batched pipelines get a DataFrame-shaped payload.
        * **Tile strategy**. Controls whether the `bbox`
          filter is pushed into the format's spatial index (rtree on
          GPKG, row-group statistics on Parquet, …) or applied after
          a full scan. Pass one of:

          - `"auto"` (default) — let pyogrio pick. For a GPKG,
            pyogrio queries the `rtree_<layer>_geom` companion
            table automatically. For a Parquet file, pyogrio /
            pyarrow push the bbox down to the row-group statistics
            and skip non-matching groups. For formats without a
            spatial index (GeoJSON, Shapefile without a `.qix`)
            this falls back to a full scan in the driver.
          - `"rtree"` — same as `"auto"`; kept as an explicit
            name so pipeline code can document intent.
          - `"row_group"` — same as `"auto"`; explicit name for
            the Parquet case.
          - `"none"` — disable index pushdown; read whole chunks
            from the driver and apply the bbox filter in Python.
            Useful when the on-disk spatial index is stale or
            suspected wrong; also exercises the "slow path" in
            tests.

        `bbox` / `where` compose with any tile_strategy. Paths run
        through :func:`pyramids._io._parse_path` so cloud URLs and
        archive paths work the same way as in :meth:`read_file`.

        Args:
            path (str | Path): File path, URL, archive path.
            layer (str | int | None): Layer selector for multi-layer
                formats.
            bbox: `(minx, miny, maxx, maxy)` filter.
            where (str | None): OGR SQL predicate.
            chunksize (int | None): `None` yields dicts, an `int`
                yields `FeatureCollection` chunks.
            tile_strategy (str): One of `"auto"`, `"rtree"`,
                `"row_group"`, `"none"`. Default `"auto"`.
            include_index (bool): When `True`, each yielded dict gets
                an additional `"id"` key whose value is the
                0-based file-row index of that feature. The chunked
                form (`chunksize=N`) attaches the same index as a
                `"_row_index"` column on the yielded FC. The indices
                stay aligned with the on-disk rows even when a
                Python-side bbox filter (`tile_strategy="none"`)
                drops some rows — only the surviving features are
                yielded, and their ids match the positions they had
                in the source file. Defaults to `False` for
                back-compat with the fiona idiom.

        Yields:
            dict | FeatureCollection: Per-feature dicts when
            `chunksize` is `None`; FeatureCollection chunks
            otherwise.

        Raises:
            ValueError: If `chunksize` is given but `< 1`, or if
                `tile_strategy` is not one of the accepted values.

        Examples:
            - Stream features one at a time as GeoJSON-style dicts:
                ```python
                >>> import tempfile
                >>> from pathlib import Path
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> d = Path(tempfile.mkdtemp())
                >>> path = d / "pts.geojson"
                >>> gdf = gpd.GeoDataFrame(
                ...     {"id": [1, 2, 3]},
                ...     geometry=[Point(0, 0), Point(1, 1), Point(2, 2)],
                ...     crs="EPSG:4326",
                ... )
                >>> gdf.to_file(path, driver="GeoJSON")
                >>> feats = list(FeatureCollection.iter_features(path))
                >>> len(feats)
                3
                >>> feats[0]["properties"]["id"]
                1

                ```
            - Stream in `chunksize=2` batches as FeatureCollection chunks:
                ```python
                >>> import tempfile
                >>> from pathlib import Path
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> d = Path(tempfile.mkdtemp())
                >>> path = d / "pts.geojson"
                >>> gdf = gpd.GeoDataFrame(
                ...     {"id": [1, 2, 3]},
                ...     geometry=[Point(0, 0), Point(1, 1), Point(2, 2)],
                ...     crs="EPSG:4326",
                ... )
                >>> gdf.to_file(path, driver="GeoJSON")
                >>> chunks = list(
                ...     FeatureCollection.iter_features(path, chunksize=2)
                ... )
                >>> [len(c) for c in chunks]
                [2, 1]

                ```
            - Invalid `chunksize` raises `ValueError`:
                ```python
                >>> from pyramids.feature import FeatureCollection
                >>> gen = FeatureCollection.iter_features("anywhere", chunksize=0)
                >>> next(gen)
                Traceback (most recent call last):
                    ...
                ValueError: chunksize must be >= 1 when supplied; got 0.

                ```
        """
        if chunksize is not None and chunksize < 1:
            raise ValueError(f"chunksize must be >= 1 when supplied; got {chunksize}.")
        if tile_strategy not in cls._VALID_TILE_STRATEGIES:
            raise ValueError(
                f"tile_strategy must be one of "
                f"{cls._VALID_TILE_STRATEGIES}; got {tile_strategy!r}."
            )

        import pyogrio

        resolved = str(_pyramids_io._parse_path(path))

        # Determine how many features are in the layer so we can
        # iterate in fixed-size batches via skip_features / max_features.
        # pyogrio's read_info is O(1) per call.
        info_kwargs: dict[str, Any] = {}
        if layer is not None:
            info_kwargs["layer"] = layer
        info = pyogrio.read_info(resolved, **info_kwargs)
        total = int(info["features"])

        if chunksize is None:
            batch_size = _DEFAULT_ITER_BATCH_SIZE
        else:
            batch_size = int(chunksize)

        # D-M3: pin the engine to pyogrio. `skip_features` /
        # `max_features` are pyogrio-specific (geopandas' fiona
        # engine silently ignores them, which would turn every chunk
        # into a full scan). Pinning the engine makes the contract
        # explicit and fails fast if pyogrio is absent.
        read_kwargs: dict[str, Any] = {"engine": "pyogrio"}
        if layer is not None:
            read_kwargs["layer"] = layer
        if where is not None:
            read_kwargs["where"] = where

        # when tile_strategy is "auto"/"rtree"/"row_group",
        # forward the bbox to pyogrio which transparently uses the
        # format's spatial index. When "none", hold the bbox back
        # and apply it in Python after each chunk loads.
        pushdown_bbox = bbox if tile_strategy != "none" else None
        python_bbox = bbox if tile_strategy == "none" else None
        if pushdown_bbox is not None:
            read_kwargs["bbox"] = pushdown_bbox

        for start in range(0, total, batch_size):
            gdf_chunk = gpd.read_file(
                resolved,
                skip_features=start,
                max_features=batch_size,
                **read_kwargs,
            )
            # remember the absolute row indices before any
            # bbox-based masking so callers can map yielded features
            # back to their source rows even after a Python-side filter
            # has dropped some of them.
            if include_index:
                row_indices = list(range(start, start + len(gdf_chunk)))
            if python_bbox is not None and len(gdf_chunk) > 0:
                xmin, ymin, xmax, ymax = python_bbox
                mask = gdf_chunk.intersects(box(xmin, ymin, xmax, ymax))
                if include_index:
                    row_indices = [ri for ri, keep in zip(row_indices, mask) if keep]
                gdf_chunk = gdf_chunk[mask]
            if chunksize is None:
                iterator = gdf_chunk.iterfeatures(na="null")
                if include_index:
                    for ri, feat in zip(row_indices, iterator):
                        feat["id"] = ri
                        yield feat
                else:
                    for feat in iterator:
                        yield feat
            else:
                chunk_fc = cls(gdf_chunk)
                if include_index:
                    chunk_fc["_row_index"] = row_indices
                yield chunk_fc

    @classmethod
    def read_file(
        cls,
        path: str | Path,
        *,
        layer: str | int | None = None,
        bbox: tuple[float, float, float, float] | Any = None,
        mask: Any = None,
        rows: slice | int | None = None,
        columns: list[str] | None = None,
        where: str | None = None,
        backend: str = "pandas",
        npartitions: int | None = None,
        chunksize: int | None = None,
        **kwargs: Any,
    ) -> FeatureCollection | LazyFeatureCollection:
        """Read a vector file into a FeatureCollection.

        path is first routed through
        :func:`pyramids._io._parse_path`, which handles:

        * Cloud-URL rewriting (`s3://`, `gs://`, `az://`,
          `abfs://`, `http(s)://`, `file://` → GDAL `/vsi*/`
          form). verified end-to-end through an HTTP test.
          For AWS / GCS / Azure credentials either set the standard
          environment variables (`AWS_ACCESS_KEY_ID`,
          `AWS_SECRET_ACCESS_KEY`, `GOOGLE_APPLICATION_CREDENTIALS`,
          `AZURE_STORAGE_CONNECTION_STRING`, …) or scope them via
          :class:`pyramids.base.remote.CloudConfig` as a context
          manager around the `read_file` call.
        * Compressed-archive dispatch for `.zip`, `.tar`, `.tar.gz`,
          `.gz` on **local** paths — the returned path is a
          `/vsizip/`, `/vsitar/` or `/vsigzip/` string that
          :func:`geopandas.read_file` (via GDAL's virtual filesystem)
          can open directly. You can either pass just the archive
          path (first contained file wins) or
          `archive.zip/inner.geojson` to target a specific member.
          Cloud + archive chaining (`http://host/x.zip`) is not
          automatic today — if you need it, stage the archive
          locally first or use `CloudConfig` with an explicit
          `/vsizip//vsicurl/...` path.

        filter kwargs are pushed down to fiona/pyogrio so the
        dataset never fully materializes when only a subset is needed.

        Args:
            path (str | Path):
                File path, URL, archive path, or
                `archive.ext/inner-file` form.
            layer (str | int | None):
                Layer name or index for multi-layer formats
                (GeoPackage, GDB, KML, …). `None` reads the first /
                default layer.
            bbox:
                `(minx, miny, maxx, maxy)` tuple, or a
                `GeoDataFrame` / `GeoSeries` / shapely geometry
                whose total bounds are used. Only features
                intersecting the bbox are loaded.
            mask:
                A shapely geometry (or mapping / GeoSeries /
                GeoDataFrame) whose geometries are used as a mask —
                only features intersecting the mask are loaded. Finer
                than `bbox` (actual geometry intersection, not just
                envelope). Mutually exclusive with `bbox`.
            rows (slice | int | None):
                `int` — read at most N rows. `slice` — read the
                given range of rows. Useful for sampling.
            columns (list[str] | None):
                Restrict loaded attribute columns. Geometry is
                always loaded. `None` loads every column.
            where (str | None):
                OGR SQL `WHERE`-clause predicate pushed down to the
                driver (e.g. `"population > 10000"`). Avoids loading
                non-matching features.
            **kwargs:
                Forwarded to :func:`geopandas.read_file` verbatim for
                engine-specific options (`engine="pyogrio"`,
                `use_arrow=True`, driver-specific creation options).

        Returns:
            FeatureCollection: The (possibly filtered) features
            wrapped as a FeatureCollection.

        Examples:
            - Load a GeoJSON file:
                ```python
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection.read_file("tests/data/coello-gauges.geojson")
                >>> len(fc) > 0
                True

                ```
        """
        resolved = _pyramids_io._parse_path(path)
        if backend == "dask":
            # dask_geopandas.read_file does NOT forward pyogrio
            # filter kwargs (bbox / mask / rows / columns / where) —
            # silently dropping them was the bug. Raise a clear
            # ValueError instead so users know to either pre-filter
            # or call .compute() and filter eagerly.
            unsupported = {
                "bbox": bbox,
                "mask": mask,
                "rows": rows,
                "columns": columns,
                "where": where,
                "layer": layer,
            }
            supplied = [k for k, v in unsupported.items() if v is not None]
            if supplied:
                raise ValueError(
                    f"backend='dask' does not support filter kwargs "
                    f"{supplied}. dask_geopandas.read_file has no "
                    "pushdown story for these. Either omit them and "
                    "filter post-load via .clip / .loc / .compute, or "
                    "switch to read_parquet(backend='dask', filters=...)"
                )
            try:
                import dask_geopandas
            except ImportError as exc:
                raise ImportError(
                    "backend='dask' requires the optional "
                    "'dask-geopandas' dependency. Install with one of:\n"
                    "  - PyPI:        pip install 'pyramids-gis[parquet-lazy]'\n"
                    "  - conda-forge: conda install -c conda-forge pyramids-parquet-lazy"
                ) from exc
            # default npartitions from file size when neither
            # kwarg was supplied; one-partition fallback defeats the
            # point of going lazy.
            partition_kwargs = _resolve_lazy_partitioning(
                resolved,
                npartitions,
                chunksize,
            )
            # wrap the lazy return as a LazyFeatureCollection so the
            # dask branch stays inside the pyramids type system.
            from pyramids.feature._lazy_collection import LazyFeatureCollection

            dask_gdf = dask_geopandas.read_file(resolved, **partition_kwargs)
            return LazyFeatureCollection.from_dask_gdf(dask_gdf)
        if backend != "pandas":
            raise ValueError(f"backend must be 'pandas' or 'dask', got {backend!r}")
        # Only pass kwargs that were actually supplied — passing the
        # defaults (None) is fine for some geopandas engines but
        # confuses others. Build a clean kwargs dict.
        passthrough: dict[str, Any] = {}
        if layer is not None:
            passthrough["layer"] = layer
        if bbox is not None:
            passthrough["bbox"] = bbox
        if mask is not None:
            passthrough["mask"] = mask
        if rows is not None:
            passthrough["rows"] = rows
        if columns is not None:
            passthrough["columns"] = columns
        if where is not None:
            passthrough["where"] = where
        passthrough.update(kwargs)
        gdf = gpd.read_file(resolved, **passthrough)
        return cls(gdf)

    @property
    def epsg(self) -> int | None:
        """EPSG code of this FeatureCollection's CRS (cached).

        The value is cached per CRS-object identity so repeated access
        on hot paths skips the `pyproj.CRS.to_epsg` call. The cache
        auto-invalidates whenever `self.crs` is replaced.

        identity-miss falls back to equality. If `self.crs` has
        been reassigned to a different CRS object that nevertheless
        compares equal to the cached one (e.g. `fc.crs = pyproj.CRS(
        "EPSG:4326")` on a frame already in EPSG:4326), we adopt the
        new object as the cache key and skip the `.to_epsg()` call.
        Only when the value really differs do we recompute.

        the equality fallback is cheaper than a fresh
        `.to_epsg()` (which re-parses the CRS) but it is not free —
        `pyproj.CRS.__eq__` does a WKT2 string comparison. If a
        future pandas/geopandas release stops returning the same
        `self.crs` object identity across accesses, the fallback
        runs on every `fc.epsg` and adds up on hot loops. Switch
        the cache key to `self.crs.to_wkt()` if a profile ever
        shows this dominating.

        Returns:
            int | None: The integer EPSG code if the CRS is registered
            in the EPSG authority; `None` when the FC has no CRS set
            or when its CRS cannot be mapped to a single EPSG code.

        Examples:
            - Frame built with WGS84 reports EPSG 4326:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
                ...     )
                ... )
                >>> fc.epsg
                4326

                ```
            - A frame without a CRS returns `None`:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame({"id": [1]}, geometry=[Point(0, 0)])
                ... )
                >>> fc.epsg is None
                True

                ```
            - Reprojecting to Web Mercator updates the cached code:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
                ...     )
                ... )
                >>> fc = fc.to_crs(3857)
                >>> fc.epsg
                3857

                ```
        """
        crs = self.crs
        cached_crs = getattr(self, "_epsg_cache_crs", None)
        if cached_crs is crs:
            return getattr(self, "_epsg_cache_value", None)
        # try equality before falling back to a fresh to_epsg() call.
        # pyproj.CRS comparison is cheaper than a full re-parse, and the
        # common "reassign an equivalent CRS" case (e.g. set_crs chain)
        # should stay in the fast path.
        if cached_crs is not None and crs is not None:
            try:
                equivalent = cached_crs == crs
            except (TypeError, ValueError):
                equivalent = False
            if equivalent:
                object.__setattr__(self, "_epsg_cache_crs", crs)
                return getattr(self, "_epsg_cache_value", None)
        if crs is None:
            value: int | None = None
        else:
            code = crs.to_epsg()
            value = int(code) if code is not None else None
        object.__setattr__(self, "_epsg_cache_crs", crs)
        object.__setattr__(self, "_epsg_cache_value", value)
        return value

    @property
    def top_left_corner(self) -> list[Number]:
        """Top-left corner `[xmin, ymax]` of the total bounds.

        Returns:
            list[Number]: Two-element list `[xmin, ymax]` — the
            minimum x-coordinate paired with the maximum y-coordinate
            of the union of all geometry bounds.

        Examples:
            - Two points span a unit square — the top-left is `[0, 1]`:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1, 2]},
                ...         geometry=[Point(0, 0), Point(1, 1)],
                ...         crs="EPSG:4326",
                ...     )
                ... )
                >>> fc.top_left_corner
                [0.0, 1.0]

                ```
            - Offset points yield the offset top-left corner:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1, 2]},
                ...         geometry=[Point(10, 20), Point(15, 30)],
                ...         crs="EPSG:4326",
                ...     )
                ... )
                >>> fc.top_left_corner
                [10.0, 30.0]

                ```
        """
        bounds = self.total_bounds.tolist()
        return [bounds[0], bounds[3]]

    @property
    def column(self) -> list[str]:
        """Deprecated alias for :attr:`columns` returning a `list[str]`.

        Returns:
            list[str]: Column names in their current order, including
            the active geometry column.

        Examples:
            - A frame with an `id` field reports both columns:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
                ...     )
                ... )
                >>> fc.column
                ['id', 'geometry']

                ```
            - Multiple attribute columns appear in insertion order:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"name": ["a"], "pop": [100]},
                ...         geometry=[Point(0, 0)],
                ...         crs="EPSG:4326",
                ...     )
                ... )
                >>> fc.column
                ['name', 'pop', 'geometry']

                ```
        """
        return self.columns.tolist()

    def __str__(self) -> str:
        """Return a short, pyramids-branded summary of the collection."""
        n = len(self)
        cols = self.columns.tolist()
        epsg = self.epsg
        return f"FeatureCollection({n} features, " f"columns={cols}, epsg={epsg})"

    def __repr__(self) -> str:
        """Return a pyramids-branded repr."""
        return (
            f"FeatureCollection(n_features={len(self)}, "
            f"columns={self.columns.tolist()}, epsg={self.epsg})"
        )

    @property
    def schema(self) -> dict:
        """Fiona-style schema: geometry type + field-type dict.

        Returns a dict shaped like fiona's `schema` attribute so
        callers migrating from `fiona.open(path).schema` can consume
        this without rewriting. The dict has three keys:

        * `"geometry"`: single string (`"Point"`, `"Polygon"`,
          …) when every row has the same geom type, otherwise
          `"Unknown"`.
        * `"properties"`: `{column_name: dtype_string}` for every
          non-geometry column.
        * `"crs"`: the :attr:`crs` as a :class:`pyproj.CRS` object,
          or `None` when the FC has no CRS set. Matches
          fiona's convention — callers migrating from
          `fiona.open(path).schema['crs']` can consume it directly.

        Empty FeatureCollections (`len(self) == 0`) report
        `"Unknown"` for the geometry type.

        Returns:
            dict: Three-key dict with `"geometry"`, `"properties"`,
            and `"crs"`.

        Examples:
            - Homogeneous point collection reports `"Point"`:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1, 2]},
                ...         geometry=[Point(0, 0), Point(1, 1)],
                ...         crs="EPSG:4326",
                ...     )
                ... )
                >>> schema = fc.schema
                >>> schema["geometry"]
                'Point'
                >>> schema["properties"]
                {'id': 'int64'}
                >>> schema["crs"].to_epsg()
                4326

                ```
            - Mixed geometry types collapse to `"Unknown"`:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point, LineString
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1, 2]},
                ...         geometry=[Point(0, 0), LineString([(0, 0), (1, 1)])],
                ...         crs="EPSG:4326",
                ...     )
                ... )
                >>> fc.schema["geometry"]
                'Unknown'

                ```
            - Frames without a CRS return `crs=None`:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame({"id": [1]}, geometry=[Point(0, 0)])
                ... )
                >>> fc.schema["crs"] is None
                True

                ```
        """
        geom_types = {g.geom_type for g in self.geometry if g is not None}
        if len(geom_types) == 1:
            (geom_type,) = geom_types
        else:
            geom_type = "Unknown"
        properties = {
            col: str(dt) for col, dt in self.dtypes.items() if col != "geometry"
        }
        return {
            "geometry": geom_type,
            "properties": properties,
            "crs": self.crs,
        }

    @classmethod
    def list_layers(cls, path: str | Path) -> list[str]:
        """List every vector-layer name in `path`.

        Routes through :func:`pyramids._io._parse_path` so the same
        cloud-URL / archive rewriting that :meth:`read_file` uses
        applies here too. Uses :func:`pyogrio.list_layers` under the
        hood (geopandas' default engine).

        results are memoised behind a 128-entry LRU cache keyed on
        the resolved `str` path. Re-calling `list_layers` on the
        same cloud URL or local path in a loop now costs one hash
        lookup instead of one datasource open. Call
        :meth:`list_layers_cache_clear` to invalidate after an
        out-of-band write.

        Args:
            path (str | Path):
                File path, URL, or archive path. Single-layer formats
                like GeoJSON return one name; multi-layer formats
                (GPKG, GDB, KML) return every layer.

        Returns:
            list[str]: Layer names in the order the driver reports them.

        Raises:
            FileNotFoundError: If `path` is a local filesystem path
                that does not exist. Cloud URLs and `/vsi*` paths
                skip this check and defer to the underlying driver
                . Previously all failures surfaced as an opaque
                `VectorDriverError("Failed to open datasource")`.

        Examples:
            - A single-layer GeoJSON returns one name derived from the filename:
                ```python
                >>> import tempfile
                >>> from pathlib import Path
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> d = Path(tempfile.mkdtemp())
                >>> path = d / "pts.geojson"
                >>> gdf = gpd.GeoDataFrame(
                ...     {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
                ... )
                >>> gdf.to_file(path, driver="GeoJSON")
                >>> FeatureCollection.list_layers(path)
                ['pts']

                ```
            - A missing local path raises `FileNotFoundError`:
                ```python
                >>> from pyramids.feature import FeatureCollection
                >>> FeatureCollection.list_layers("does/not/exist.geojson")
                Traceback (most recent call last):
                    ...
                FileNotFoundError: list_layers: no file at 'does/not/exist.geojson'.

                ```
        """
        # pre-check local-path existence so the caller sees
        # a `FileNotFoundError` naming the path instead of a generic
        # driver-open failure. Defer to `base.remote.is_remote` as
        # the single source of truth for which schemes are remote —
        # the previous hardcoded prefix tuple would silently treat any
        # future scheme as local and raise a misleading error.
        path_str = str(path)
        if not is_remote(path_str):
            local = Path(path_str)
            if not local.exists():
                raise FileNotFoundError(f"list_layers: no file at {path_str!r}.")

        resolved = str(_pyramids_io._parse_path(path))
        return list(_list_layers_cached(resolved))

    @classmethod
    def list_layers_cache_clear(cls) -> None:
        """Clear the C15 LRU cache backing :meth:`list_layers`.

        Call this after writing a new layer to an existing multi-layer
        file (e.g. a GPKG) if you then want :meth:`list_layers` to see
        the new layer. Otherwise the 128-entry LRU cache is self-
        managing and callers do not need to touch it.

        Returns:
            None: This method does not return a value.

        Examples:
            - Clearing an empty cache is a safe no-op:
                ```python
                >>> from pyramids.feature import FeatureCollection
                >>> FeatureCollection.list_layers_cache_clear()
                >>> FeatureCollection.list_layers_cache_clear()

                ```
            - After an out-of-band write, clear the cache so the next
              `list_layers` call re-reads the updated file:
                ```python
                >>> import tempfile
                >>> from pathlib import Path
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> d = Path(tempfile.mkdtemp())
                >>> path = d / "pts.geojson"
                >>> gpd.GeoDataFrame(
                ...     {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
                ... ).to_file(path, driver="GeoJSON")
                >>> _ = FeatureCollection.list_layers(path)
                >>> FeatureCollection.list_layers_cache_clear()
                >>> FeatureCollection.list_layers(path)
                ['pts']

                ```
        """
        _list_layers_cached.cache_clear()

    @classmethod
    def open_arrow(
        cls,
        path: str | Path,
        *,
        layer: str | int | None = None,
        columns: list[str] | None = None,
        bbox: tuple[float, float, float, float] | None = None,
        where: str | None = None,
        batch_size: int | None = None,
    ) -> Any:
        """Open a vector file as a streaming :class:`pyarrow.RecordBatchReader`.

        Thin wrapper over :func:`pyogrio.raw.open_arrow` that surfaces
        the underlying Arrow RecordBatch iterator. Rows are yielded in
        batches, so callers can iterate through multi-GB datasets
        without materializing the whole table in memory — useful for
        building custom dask partitioners.

        Args:
            path: Vector file path (Shapefile, GPKG, FlatGeobuf,
                GeoJSON, GeoParquet,...). Routed through
                :func:`pyramids._io._parse_path` so cloud URLs work.
            layer: Layer name or index for multi-layer formats.
            columns: Attribute columns to load (`geometry` is
                always included).
            bbox: `(minx, miny, maxx, maxy)` filter.
            where: OGR SQL `WHERE` predicate pushed down to the
                driver.
            batch_size: Requested RecordBatch size in rows. `None`
                uses the driver default.

        Returns:
            pyarrow.RecordBatchReader: A streaming reader. Call
            `.read_all()` to materialise, or iterate for row-batch
            consumption.

        Raises:
            ImportError: If :mod:`pyogrio` is not installed.
        """
        try:
            from pyogrio.raw import open_arrow
        except ImportError as exc:
            raise ImportError(
                "open_arrow requires the optional 'pyogrio' dependency. "
                "Install with one of:\n"
                "  - PyPI:        pip install pyogrio\n"
                "  - conda-forge: conda install -c conda-forge pyogrio"
            ) from exc
        resolved = _pyramids_io._parse_path(path)
        kwargs: dict[str, Any] = {}
        if layer is not None:
            kwargs["layer"] = layer
        if columns is not None:
            kwargs["columns"] = columns
        if bbox is not None:
            kwargs["bbox"] = bbox
        if where is not None:
            kwargs["where"] = where
        if batch_size is not None:
            kwargs["batch_size"] = batch_size
        return open_arrow(resolved, **kwargs)

    @classmethod
    def read_parquet(
        cls,
        path: str | Path,
        *,
        columns: list[str] | None = None,
        bbox: tuple[float, float, float, float] | None = None,
        backend: str = "pandas",
        split_row_groups: bool | None = None,
        filters: list | None = None,
        blocksize: int | str | None = None,
        storage_options: dict | None = None,
        **kwargs: Any,
    ) -> FeatureCollection | LazyFeatureCollection:
        """Read a GeoParquet file into a FeatureCollection.

        GeoParquet is a cloud-native columnar vector format (OGC-
        adopted December 2024) — faster to scan than GeoJSON, smaller
        than Shapefile, and partitioned in a way that suits distributed
        compute. This method is a thin wrapper around
        :func:`geopandas.read_parquet`; the path is first routed
        through :func:`pyramids._io._parse_path` so cloud URLs
        (`s3://`, `gs://`, `http(s)://`, …) resolve the same way
        they do in :meth:`read_file`.

        Requires the optional :mod:`pyarrow` dependency. Install with one of:

        - PyPI: ``pip install 'pyramids-gis[parquet]'``
        - conda-forge: ``conda install -c conda-forge pyramids-parquet``

        Args:
            path (str | Path):
                Local path, cloud URL, or any form
                :func:`pyramids._io._parse_path` accepts.
            columns (list[str] | None):
                Project a subset of columns — Parquet's columnar
                layout makes this a true I/O win, unlike row-oriented
                formats. `geometry` is always loaded. `None`
                loads every column.
            bbox (tuple[float, float, float, float] | None):
                `(minx, miny, maxx, maxy)` spatial filter.
                Forwarded to :func:`geopandas.read_parquet` which uses
                the file's GeoParquet spatial-index metadata when
                present to skip non-matching row groups — a true I/O
                win on large files. `None` (default) loads every
                feature.
            **kwargs:
                Forwarded to :func:`geopandas.read_parquet`
                (`storage_options=` for fsspec, etc.).

        Returns:
            FeatureCollection: The file's features wrapped as a
            FeatureCollection.

        Raises:
            ImportError: If :mod:`pyarrow` is not installed, with a
                pyramids-branded message pointing at the
                `[parquet]` optional-dependency extra (D-M5).

        Examples:
            - Round-trip a small FC through GeoParquet (requires pyarrow):
                ```python
                >>> import tempfile  # doctest: +SKIP
                >>> from pathlib import Path  # doctest: +SKIP
                >>> import geopandas as gpd  # doctest: +SKIP
                >>> from shapely.geometry import Point  # doctest: +SKIP
                >>> from pyramids.feature import FeatureCollection  # doctest: +SKIP
                >>> d = Path(tempfile.mkdtemp())  # doctest: +SKIP
                >>> path = d / "pts.parquet"  # doctest: +SKIP
                >>> gpd.GeoDataFrame(
                ...     {"id": [1, 2]},
                ...     geometry=[Point(0, 0), Point(1, 1)],
                ...     crs="EPSG:4326",
                ... ).to_parquet(path)  # doctest: +SKIP
                >>> fc = FeatureCollection.read_parquet(path)  # doctest: +SKIP
                >>> len(fc)  # doctest: +SKIP
                2
                >>> fc.epsg  # doctest: +SKIP
                4326

                ```
            - Project a subset of columns to speed up I/O on wide files:
                ```python
                >>> fc = FeatureCollection.read_parquet(  # doctest: +SKIP
                ...     "s3://bucket/big.parquet",
                ...     columns=["id", "geometry"],
                ... )
                >>> fc.column  # doctest: +SKIP
                ['id', 'geometry']

                ```
            - A missing pyarrow dependency raises a branded `ImportError`:
                ```python
                >>> FeatureCollection.read_parquet("x.parquet")  # doctest: +SKIP
                Traceback (most recent call last):
                    ...
                ImportError: GeoParquet support requires the optional 'pyarrow'...

                ```
        """
        resolved = _pyramids_io._parse_path(path)
        if backend == "dask":
            # check deps in order of specificity — the backend
            # request is the more specific signal, so the
            # dask-geopandas hint beats the generic pyarrow one.
            # When both are missing, the dask-geopandas error names
            # the extra that installs both ([parquet-lazy]).
            try:
                import dask_geopandas
            except ImportError as exc:
                raise ImportError(
                    "backend='dask' requires the optional "
                    "'dask-geopandas' dependency. Install with one of:\n"
                    "  - PyPI:        pip install 'pyramids-gis[parquet-lazy]'\n"
                    "  - conda-forge: conda install -c conda-forge pyramids-parquet-lazy"
                ) from exc
            dask_kwargs: dict[str, Any] = {}
            if columns is not None:
                dask_kwargs["columns"] = columns
            if split_row_groups is not None:
                dask_kwargs["split_row_groups"] = split_row_groups
            if filters is not None:
                dask_kwargs["filters"] = filters
            if blocksize is not None:
                dask_kwargs["blocksize"] = blocksize
            if storage_options is not None:
                dask_kwargs["storage_options"] = storage_options
            dask_kwargs.update(kwargs)
            # dask_geopandas is installed → assert pyarrow too, so
            # the user gets the pyramids-branded hint (not the
            # upstream message dask_geopandas would emit when it tries
            # to read). `[parquet-lazy]` pulls both.
            _require_pyarrow()
            # wrap the lazy return as a LazyFeatureCollection so the
            # dask branch stays inside the pyramids type system.
            from pyramids.feature._lazy_collection import LazyFeatureCollection

            dask_gdf = dask_geopandas.read_parquet(resolved, **dask_kwargs)
            return LazyFeatureCollection.from_dask_gdf(dask_gdf)
        if backend != "pandas":
            raise ValueError(f"backend must be 'pandas' or 'dask', got {backend!r}")
        _require_pyarrow()
        # geopandas 1.x forwards **kwargs straight into
        # `pyarrow.parquet.read_table`, which has never accepted the
        # pandas-style `engine=` kwarg. `_require_pyarrow()` above
        # already hard-guarantees the pyarrow backend, so no injection
        # is needed here. If geopandas ever reintroduces a fastparquet
        # path it will be opt-in via a new kwarg, not a silent switch.
        passthrough: dict[str, Any] = {}
        passthrough.update(kwargs)
        if columns is not None:
            passthrough["columns"] = columns
        if bbox is not None:
            passthrough["bbox"] = bbox
        if storage_options is not None:
            passthrough["storage_options"] = storage_options
        gdf = gpd.read_parquet(resolved, **passthrough)
        return cls(gdf)

    def to_parquet(
        self,
        path: str | Path,
        *,
        compression: str = "snappy",
        index: bool | None = None,
        **kwargs: Any,
    ) -> None:
        """Write this FeatureCollection to GeoParquet.

        Thin wrapper around :meth:`geopandas.GeoDataFrame.to_parquet`
        that defaults :param:`compression` to `"snappy"` — the
        format-standard tradeoff between speed and size.

        Requires the optional :mod:`pyarrow` dependency. Install with one of:

        - PyPI: ``pip install 'pyramids-gis[parquet]'``
        - conda-forge: ``conda install -c conda-forge pyramids-parquet``

        Args:
            path (str | Path):
                Destination file path.
            compression (str):
                Parquet compression codec — `"snappy"` (default),
                `"gzip"`, `"brotli"`, `"lz4"`, `"zstd"`, or
                `"none"`. `"snappy"` is the GeoParquet-spec
                recommended default.
            index (bool | None):
                Whether to include the pandas index as a column.
                `None` (default) uses geopandas' default behavior:
                preserve a non-default index, drop the default
                `RangeIndex`.
            **kwargs:
                Forwarded to :meth:`geopandas.GeoDataFrame.to_parquet`.

        Raises:
            ImportError: If :mod:`pyarrow` is not installed, with a
                pyramids-branded message pointing at the
                `[parquet]` optional-dependency extra (D-M5).

        Examples:
            - Write a FeatureCollection with the default snappy codec:
                ```python
                >>> import tempfile  # doctest: +SKIP
                >>> from pathlib import Path  # doctest: +SKIP
                >>> import geopandas as gpd  # doctest: +SKIP
                >>> from shapely.geometry import Point  # doctest: +SKIP
                >>> from pyramids.feature import FeatureCollection  # doctest: +SKIP
                >>> d = Path(tempfile.mkdtemp())  # doctest: +SKIP
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1, 2]},
                ...         geometry=[Point(0, 0), Point(1, 1)],
                ...         crs="EPSG:4326",
                ...     )
                ... )  # doctest: +SKIP
                >>> path = d / "out.parquet"  # doctest: +SKIP
                >>> fc.to_parquet(path)  # doctest: +SKIP
                >>> path.exists()  # doctest: +SKIP
                True

                ```
            - Pick a different codec (e.g. zstd for better compression):
                ```python
                >>> import tempfile  # doctest: +SKIP
                >>> from pathlib import Path  # doctest: +SKIP
                >>> import geopandas as gpd  # doctest: +SKIP
                >>> from shapely.geometry import Point  # doctest: +SKIP
                >>> from pyramids.feature import FeatureCollection  # doctest: +SKIP
                >>> d = Path(tempfile.mkdtemp())  # doctest: +SKIP
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
                ...     )
                ... )  # doctest: +SKIP
                >>> fc.to_parquet(d / "out.parquet", compression="zstd")  # doctest: +SKIP

                ```
        """
        _require_pyarrow()
        super().to_parquet(path, compression=compression, index=index, **kwargs)

    def to_file(
        self,
        path: str | Path,
        driver: str = "geojson",
        *,
        layer: str | None = None,
        mode: str = "w",
        **creation_options: Any,
    ) -> None:
        """Write this FeatureCollection to a vector file.

        `layer`, `mode`, and arbitrary driver creation
        options are now first-class kwargs. Previously callers had to
        rely on implicit `**kwargs` forwarding, which hurt
        discoverability.

        Args:
            path (str | Path):
                Destination file path.
            driver (str):
                Driver alias (e.g. `"geojson"`, `"gpkg"`) or
                literal GDAL driver name (`"GeoJSON"`, `"GPKG"`,
                `"ESRI Shapefile"`). Resolved via :class:`Catalog`.
            layer (str | None):
                Layer name for multi-layer drivers (GPKG, GDB, …).
                Writing two layers into the same GPKG is the canonical
                use case. `None` defers to the driver default.
            mode (str):
                `"w"` (default) overwrites; `"a"` appends to an
                existing layer. Append support depends on the driver
                — GPKG and Shapefile accept it, GeoJSON does not.
            **creation_options:
                Driver-specific creation options, forwarded to the
                underlying engine (pyogrio / fiona). Examples:

                * GPKG: `SPATIAL_INDEX="YES"`, `FID="id"`.
                * Shapefile: `ENCODING="UTF-8"`.
                * GeoJSON: `COORDINATE_PRECISION=6`, `RFC7946=YES`.

                Keys are case-preserving and passed verbatim to the
                driver; consult the GDAL driver docs for the full
                list.

                pyogrio (the default geopandas engine on 1.0+)
                raises :class:`ValueError` with the message
                `"unrecognized option '<name>' for driver '<driver>'"`
                when a supplied option is neither in the driver's
                dataset nor its layer creation-option list. This
                surfaces typos (`SPATIAL_INDX` vs `SPATIAL_INDEX`)
                at write-time rather than silently producing a
                different file. Some drivers may still accept options
                that pyogrio does not list — verify against the
                driver's docs when in doubt.

        Raises:
            ValueError: If `mode` isn't `"w"` or `"a"`, or if a
                supplied creation option is not recognised by the
                driver (raised by pyogrio — see the `**creation_options`
                note above).

        Examples:
            - Round-trip a small FC through GeoJSON (the default driver):
                ```python
                >>> import tempfile
                >>> from pathlib import Path
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> d = Path(tempfile.mkdtemp())
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1, 2]},
                ...         geometry=[Point(0, 0), Point(1, 1)],
                ...         crs="EPSG:4326",
                ...     )
                ... )
                >>> path = d / "out.geojson"
                >>> fc.to_file(path)
                >>> path.exists()
                True
                >>> FeatureCollection.read_file(path).column
                ['id', 'geometry']

                ```
            - Write to GeoPackage with a named layer:
                ```python
                >>> import tempfile
                >>> from pathlib import Path
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> d = Path(tempfile.mkdtemp())
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
                ...     )
                ... )
                >>> path = d / "out.gpkg"
                >>> fc.to_file(path, driver="gpkg", layer="rivers")
                >>> FeatureCollection.list_layers(path)
                ['rivers']

                ```
            - Invalid `mode` raises `ValueError` before touching the file:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
                ...     )
                ... )
                >>> fc.to_file("ignored.geojson", mode="x")
                Traceback (most recent call last):
                    ...
                ValueError: mode must be 'w' (write) or 'a' (append); got 'x'.

                ```
        """
        if mode not in ("w", "a"):
            raise ValueError(f"mode must be 'w' (write) or 'a' (append); got {mode!r}.")
        try:
            resolved = CATALOG.get_gdal_name(driver) or driver
        except AttributeError:
            resolved = driver

        # pin the engine to pyogrio to match :meth:`read_file` and
        # :meth:`iter_features`. Callers who want fiona for some reason
        # can override via `engine="fiona"` in creation_options, but
        # the default gets the fast path and the pyogrio-specific
        # unknown-option validation.
        passthrough: dict[str, Any] = {
            "driver": resolved,
            "mode": mode,
            "engine": "pyogrio",
        }
        if layer is not None:
            passthrough["layer"] = layer
        passthrough.update(creation_options)
        super().to_file(path, **passthrough)

    # FeatureCollection.to_dataset was moved to
    # Dataset.from_features(features,...) to break the circular import
    # that used to force a CLAUDE.md-violating inline
    # `from pyramids.dataset import Dataset` inside the method body.
    # Callers should migrate:
    # fc.to_dataset(dataset=ds, column_name="pop")
    # → Dataset.from_features(fc, template=ds, column_name="pop")
    # fc.to_dataset(cell_size=10)
    # → Dataset.from_features(fc, cell_size=10)

    def explode(self, geometry: str = "multipolygon") -> FeatureCollection:
        """Explode multi-geometry rows into per-row single geometries.

        Returns a new ``FeatureCollection`` where every row whose geometry
        type matches ``geometry`` is split so each child geometry becomes
        its own row. The current frame is not mutated.

        Args:
            geometry (str): The geometry type to explode (case-insensitive).
                Defaults to ``"multipolygon"``.

        Returns:
            FeatureCollection: A new collection with the same CRS as
            ``self`` and exploded geometries.

        Examples:
            - Explode a frame mixing one MultiPolygon with a Polygon:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Polygon, MultiPolygon
                >>> from pyramids.feature import FeatureCollection
                >>> gdf = gpd.GeoDataFrame(
                ...     {
                ...         "name": ["a", "b"],
                ...         "geometry": [
                ...             MultiPolygon([
                ...                 Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]),
                ...                 Polygon([(5, 5), (7, 5), (7, 7), (5, 7)]),
                ...             ]),
                ...             Polygon([(10, 10), (11, 10), (11, 11), (10, 11)]),
                ...         ],
                ...     },
                ...     crs="EPSG:4326",
                ... )
                >>> fc = FeatureCollection(gdf)
                >>> result = fc.explode("multipolygon")
                >>> len(result)
                3
                >>> [g.geom_type for g in result.geometry]
                ['Polygon', 'Polygon', 'Polygon']

                ```
        """
        return FeatureCollection(_geom.explode_gdf(self, geometry=geometry))

    def with_coordinates(self) -> FeatureCollection:
        """Return a new FeatureCollection with per-vertex `x` and `y` columns.

        non-mutating replacement for the old `xy()` method
        (which has been deleted). Matches pandas / geopandas
        convention — data-transformation methods return a new object.
        The `with_` prefix follows the stdlib/pandas pattern for
        "return a copy with this change applied" (e.g.
        :meth:`pathlib.Path.with_suffix`).

        Explodes MultiPolygon and GeometryCollection geometries into
        their parts first, then attaches `x` and `y` columns
        containing the coordinate sequences of each row.

        Returns:
            FeatureCollection: A new FeatureCollection (`self` is
            not modified) with the original columns plus `x` and
            `y` per-vertex coordinate lists.

        Examples:
            - A Point FC gets scalar `x` / `y` per row:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1, 2]},
                ...         geometry=[Point(1.0, 2.0), Point(3.0, 4.0)],
                ...         crs="EPSG:4326",
                ...     )
                ... )
                >>> out = fc.with_coordinates()
                >>> list(out["x"])
                [1.0, 3.0]
                >>> list(out["y"])
                [2.0, 4.0]

                ```
            - The input FC is not mutated:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1]}, geometry=[Point(0.0, 0.0)],
                ...         crs="EPSG:4326",
                ...     )
                ... )
                >>> _ = fc.with_coordinates()
                >>> "x" in fc.columns
                False

                ```
        """
        gdf = _geom.explode_gdf(
            gpd.GeoDataFrame(self, copy=True), geometry="multipolygon"
        )
        gdf = _geom.explode_gdf(gdf, geometry="geometrycollection")

        fc = FeatureCollection(gdf)
        fc["x"] = fc.apply(
            _geom.get_coords, geom_col="geometry", coord_type="x", axis=1
        )
        fc["y"] = fc.apply(
            _geom.get_coords, geom_col="geometry", coord_type="y", axis=1
        )
        fc.reset_index(drop=True, inplace=True)
        return fc

    def plot(
        self,
        column: str | None = None,
        basemap: bool | str | None = None,
        **kwargs: Any,
    ) -> Any:
        """Plot features, optionally on a web-tile basemap.

        Delegates to :meth:`geopandas.GeoDataFrame.plot` and, when
        `basemap` is truthy, adds an OSM (or named provider) tile
        layer underneath.

        Raises:
            ValueError: If `basemap` is requested but the FC has no CRS.
        """
        ax = super().plot(column=column, **kwargs)

        if basemap:
            if self.epsg is None:
                raise CRSError(
                    "FeatureCollection must have a CRS (epsg) to use basemap."
                )
            source = basemap if isinstance(basemap, str) else None
            add_basemap(ax, crs=self.epsg, source=source)

        return ax

    def concat(self, other: GeoDataFrame) -> FeatureCollection:
        """Concatenate another GeoDataFrame onto this FeatureCollection.

        mirrors :func:`pandas.concat` — returns a new
        `FeatureCollection` and never mutates `self`. No
        `inplace` kwarg (pandas' `pd.concat` has never had one;
        follow the convention).

        Equivalent to `pd.concat([fc, other])` which also works
        directly and returns a `FeatureCollection` via the
        `_constructor` hook.

        a CRS mismatch between `self` and `other` raises
        :class:`pyramids.base._errors.CRSError`. The old behaviour
        silently adopted `self`'s CRS — which corrupted the
        `other` rows' coordinates if the two frames were in
        different CRSes. Callers that want to force-concat across
        CRSes must `other.to_crs(self.crs)` first. An
        unset-on-one-side case (one CRS is `None`) is permitted so
        you can seed a CRS by concatenating a CRS-carrying frame
        onto a freshly-constructed empty FC.

        Args:
            other (GeoDataFrame): The rows to append.

        Returns:
            FeatureCollection: A new FC containing `self`'s rows
            followed by `other`'s rows, with `self`'s CRS and a
            freshly-reset index.

        Raises:
            CRSError: If both frames carry a CRS and the two CRSes
                do not match.

        Examples:
            - Concatenate two single-row FCs on matching CRS:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> a = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1]}, geometry=[Point(0, 0)],
                ...         crs="EPSG:4326",
                ...     )
                ... )
                >>> b = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [2]}, geometry=[Point(1, 1)],
                ...         crs="EPSG:4326",
                ...     )
                ... )
                >>> out = a.concat(b)
                >>> len(out)
                2
                >>> list(out["id"])
                [1, 2]
                >>> out.crs.to_epsg()
                4326

                ```
            - CRS mismatch raises `CRSError`:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> a = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1]}, geometry=[Point(0, 0)],
                ...         crs="EPSG:4326",
                ...     )
                ... )
                >>> b = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [2]}, geometry=[Point(1, 1)],
                ...         crs="EPSG:3857",
                ...     )
                ... )
                >>> a.concat(b)
                Traceback (most recent call last):
                    ...
                pyramids.base._errors.CRSError: concat: CRS mismatch...

                ```
        """
        # validate CRS agreement up front.
        if self.crs is not None and other.crs is not None:
            if self.crs != other.crs:
                raise CRSError(
                    f"concat: CRS mismatch — self.crs = {self.crs!r}, "
                    f"other.crs = {other.crs!r}. Reproject one side "
                    f"— `other.to_crs(self.crs)` OR "
                    f"`self.to_crs(other.crs)` — before "
                    f"concatenating, or strip one CRS with "
                    f".set_crs(None, allow_override=True)."
                )
        combined = gpd.GeoDataFrame(pd.concat([self, other]))
        combined.index = list(range(len(combined)))
        combined.crs = self.crs if self.crs is not None else other.crs
        return FeatureCollection(combined)

    def with_centroid(self) -> FeatureCollection:
        """Return a new FC with per-feature center-point columns attached.

        non-mutating replacement for the old `center_point()`
        method (which has been deleted). The `with_` prefix mirrors
        stdlib / pandas conventions for "return a copy with this
        change applied".

        Computes average x/y per feature (after
        :meth:`with_coordinates`) and attaches three columns:
        `avg_x`, `avg_y` and `center_point` (shapely `Point`).

        feeding a degenerate or empty geometry (for example an
        empty `Point`, or a `Polygon` whose ring has zero area)
        produces `(NaN, NaN)` averages. The method emits a single
        `UserWarning` listing the row indices whose `avg_x` /
        `avg_y` could not be computed so downstream code can guard
        against the NaN centroids instead of silently consuming them.
        The `center_point` value at those rows is an empty
        `shapely.Point` (`Point.is_empty is True`) rather than a
        `(NaN, NaN)` point.

        Returns:
            FeatureCollection: A new FeatureCollection (`self` is
            not modified) with `x`, `y`, `avg_x`, `avg_y`,
            `center_point` columns added.

        Examples:
            - Compute centroids for a 2-polygon FC:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Polygon
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1, 2]},
                ...         geometry=[
                ...             Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]),
                ...             Polygon([(4, 4), (6, 4), (6, 6), (4, 6)]),
                ...         ],
                ...         crs="EPSG:4326",
                ...     )
                ... )
                >>> out = fc.with_centroid()
                >>> [(p.x, p.y) for p in out["center_point"]]
                [(0.8, 0.8), (4.8, 4.8)]

                ```
            - A Point FC is a no-op for the coordinate lists (each row
              is already a single vertex); the centroid equals the point:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1, 2]},
                ...         geometry=[Point(3.0, 4.0), Point(7.0, 8.0)],
                ...         crs="EPSG:4326",
                ...     )
                ... )
                >>> out = fc.with_centroid()
                >>> [(p.x, p.y) for p in out["center_point"]]
                [(3.0, 4.0), (7.0, 8.0)]

                ```
        """
        fc = self.with_coordinates()
        for i, row_i in fc.iterrows():
            fc.loc[i, "avg_x"] = np.mean(row_i["x"])
            fc.loc[i, "avg_y"] = np.mean(row_i["y"])

        # detect rows whose averaged coordinate could not be
        # computed (empty geometry, all-NaN rings, etc.). Emit a single
        # summary warning and substitute an empty Point so the column
        # does not expose a `(NaN, NaN)` Point that would then crash
        # downstream reprojections.
        avg_x = fc["avg_x"].to_numpy()
        avg_y = fc["avg_y"].to_numpy()
        bad_mask = np.isnan(avg_x) | np.isnan(avg_y)
        if bad_mask.any():
            bad_idx = [int(i) for i, is_bad in enumerate(bad_mask) if is_bad]
            warnings.warn(
                f"with_centroid: {len(bad_idx)} row(s) yielded NaN centroids "
                f"(rows {bad_idx}). Their `center_point` is an empty "
                f"shapely.Point. Drop or repair those rows before running "
                f"a method that requires a valid centroid (e.g. reproject, "
                f"distance).",
                GeometryWarning,
                stacklevel=2,
            )

        # single-pass build. The previous implementation built a
        # throwaway `coords_list` (with NaN placeholders for the bad
        # rows), called `create_points` on it, then iterated the
        # result a second time to substitute empty Points for the bad
        # rows. Skip both intermediates — write the final column value
        # directly.
        cleaned: list[Any] = [
            Point() if bad else Point(ax, ay)
            for ax, ay, bad in zip(avg_x.tolist(), avg_y.tolist(), bad_mask.tolist())
        ]
        fc["center_point"] = cleaned
        return fc

`epsg` `property` #

EPSG code of this FeatureCollection's CRS (cached).

The value is cached per CRS-object identity so repeated access on hot paths skips the pyproj.CRS.to_epsg call. The cache auto-invalidates whenever self.crs is replaced.

identity-miss falls back to equality. If self.crs has been reassigned to a different CRS object that nevertheless compares equal to the cached one (e.g. fc.crs = pyproj.CRS( "EPSG:4326") on a frame already in EPSG:4326), we adopt the new object as the cache key and skip the .to_epsg() call. Only when the value really differs do we recompute.

the equality fallback is cheaper than a fresh .to_epsg() (which re-parses the CRS) but it is not free — pyproj.CRS.__eq__ does a WKT2 string comparison. If a future pandas/geopandas release stops returning the same self.crs object identity across accesses, the fallback runs on every fc.epsg and adds up on hot loops. Switch the cache key to self.crs.to_wkt() if a profile ever shows this dominating.

Returns:

Type	Description
`int \| None`	int \| None: The integer EPSG code if the CRS is registered
`int \| None`	in the EPSG authority; `None` when the FC has no CRS set
`int \| None`	or when its CRS cannot be mapped to a single EPSG code.

Examples:

Frame built with WGS84 reports EPSG 4326:

>>> import geopandas as gpd
>>> from shapely.geometry import Point
>>> from pyramids.feature import FeatureCollection
>>> fc = FeatureCollection(
...     gpd.GeoDataFrame(
...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
...     )
... )
>>> fc.epsg
4326

A frame without a CRS returns None:

>>> import geopandas as gpd
>>> from shapely.geometry import Point
>>> from pyramids.feature import FeatureCollection
>>> fc = FeatureCollection(
...     gpd.GeoDataFrame({"id": [1]}, geometry=[Point(0, 0)])
... )
>>> fc.epsg is None
True

Reprojecting to Web Mercator updates the cached code:

>>> import geopandas as gpd
>>> from shapely.geometry import Point
>>> from pyramids.feature import FeatureCollection
>>> fc = FeatureCollection(
...     gpd.GeoDataFrame(
...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
...     )
... )
>>> fc = fc.to_crs(3857)
>>> fc.epsg
3857

`top_left_corner` `property` #

Top-left corner [xmin, ymax] of the total bounds.

Returns:

Type	Description
`list[Number]`	list[Number]: Two-element list `[xmin, ymax]` — the
`list[Number]`	minimum x-coordinate paired with the maximum y-coordinate
`list[Number]`	of the union of all geometry bounds.

Examples:

Two points span a unit square — the top-left is [0, 1]:

>>> import geopandas as gpd
>>> from shapely.geometry import Point
>>> from pyramids.feature import FeatureCollection
>>> fc = FeatureCollection(
...     gpd.GeoDataFrame(
...         {"id": [1, 2]},
...         geometry=[Point(0, 0), Point(1, 1)],
...         crs="EPSG:4326",
...     )
... )
>>> fc.top_left_corner
[0.0, 1.0]

Offset points yield the offset top-left corner:

>>> import geopandas as gpd
>>> from shapely.geometry import Point
>>> from pyramids.feature import FeatureCollection
>>> fc = FeatureCollection(
...     gpd.GeoDataFrame(
...         {"id": [1, 2]},
...         geometry=[Point(10, 20), Point(15, 30)],
...         crs="EPSG:4326",
...     )
... )
>>> fc.top_left_corner
[10.0, 30.0]

`column` `property` #

Deprecated alias for :attr:columns returning a list[str].

Returns:

Type	Description
`list[str]`	list[str]: Column names in their current order, including
`list[str]`	the active geometry column.

Examples:

A frame with an id field reports both columns:

>>> import geopandas as gpd
>>> from shapely.geometry import Point
>>> from pyramids.feature import FeatureCollection
>>> fc = FeatureCollection(
...     gpd.GeoDataFrame(
...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
...     )
... )
>>> fc.column
['id', 'geometry']

Multiple attribute columns appear in insertion order:

>>> import geopandas as gpd
>>> from shapely.geometry import Point
>>> from pyramids.feature import FeatureCollection
>>> fc = FeatureCollection(
...     gpd.GeoDataFrame(
...         {"name": ["a"], "pop": [100]},
...         geometry=[Point(0, 0)],
...         crs="EPSG:4326",
...     )
... )
>>> fc.column
['name', 'pop', 'geometry']

`schema` `property` #

Fiona-style schema: geometry type + field-type dict.

Returns a dict shaped like fiona's schema attribute so callers migrating from fiona.open(path).schema can consume this without rewriting. The dict has three keys:

"geometry": single string ("Point", "Polygon", …) when every row has the same geom type, otherwise "Unknown".
"properties": {column_name: dtype_string} for every non-geometry column.
"crs": the :attr:crs as a :class:pyproj.CRS object, or None when the FC has no CRS set. Matches fiona's convention — callers migrating from fiona.open(path).schema['crs'] can consume it directly.

Empty FeatureCollections (len(self) == 0) report "Unknown" for the geometry type.

Returns:

Name	Type	Description
`dict`	`dict`	Three-key dict with `"geometry"`, `"properties"`,
	`dict`	and `"crs"`.

Examples:

Homogeneous point collection reports "Point":

>>> import geopandas as gpd
>>> from shapely.geometry import Point
>>> from pyramids.feature import FeatureCollection
>>> fc = FeatureCollection(
...     gpd.GeoDataFrame(
...         {"id": [1, 2]},
...         geometry=[Point(0, 0), Point(1, 1)],
...         crs="EPSG:4326",
...     )
... )
>>> schema = fc.schema
>>> schema["geometry"]
'Point'
>>> schema["properties"]
{'id': 'int64'}
>>> schema["crs"].to_epsg()
4326

Mixed geometry types collapse to "Unknown":

>>> import geopandas as gpd
>>> from shapely.geometry import Point, LineString
>>> from pyramids.feature import FeatureCollection
>>> fc = FeatureCollection(
...     gpd.GeoDataFrame(
...         {"id": [1, 2]},
...         geometry=[Point(0, 0), LineString([(0, 0), (1, 1)])],
...         crs="EPSG:4326",
...     )
... )
>>> fc.schema["geometry"]
'Unknown'

Frames without a CRS return crs=None:

>>> import geopandas as gpd
>>> from shapely.geometry import Point
>>> from pyramids.feature import FeatureCollection
>>> fc = FeatureCollection(
...     gpd.GeoDataFrame({"id": [1]}, geometry=[Point(0, 0)])
... )
>>> fc.schema["crs"] is None
True

`init(data=None, *args, **kwargs)` #

Construct a FeatureCollection.

Accepts anything :class:geopandas.GeoDataFrame accepts. Rejects ogr.DataSource / gdal.Dataset with a clear error .

Source code in src/pyramids/feature/collection.py

def __init__(self, data: Any = None, *args: Any, **kwargs: Any) -> None:
    """Construct a FeatureCollection.

    Accepts anything :class:`geopandas.GeoDataFrame` accepts.
    Rejects `ogr.DataSource` / `gdal.Dataset` with a clear error
    .
    """
    if isinstance(data, (ogr.DataSource, gdal.Dataset)):
        raise TypeError(
            "FeatureCollection no longer accepts ogr.DataSource or "
            "gdal.Dataset objects. OGR is an internal implementation "
            "detail. Use FeatureCollection.read_file(path) to load a "
            "file, or pass a GeoDataFrame."
        )
    super().__init__(data, *args, **kwargs)

`enter()` #

Enter a context-managed block. Returns self.

Returns:

Name	Type	Description
`FeatureCollection`	`FeatureCollection`	`self` — the exact same instance, so
	`FeatureCollection`	`with... as fc:` binds `fc` to this collection.

Examples:

Use as a context manager and access rows inside the block:

>>> import geopandas as gpd
>>> from shapely.geometry import Point
>>> from pyramids.feature import FeatureCollection
>>> gdf = gpd.GeoDataFrame(
...     {"id": [1, 2]},
...     geometry=[Point(0, 0), Point(1, 1)],
...     crs="EPSG:4326",
... )
>>> with FeatureCollection(gdf) as fc:
...     n = len(fc)
>>> n
2

Exceptions raised inside the block still propagate:

>>> import geopandas as gpd
>>> from shapely.geometry import Point
>>> from pyramids.feature import FeatureCollection
>>> fc = FeatureCollection(
...     gpd.GeoDataFrame(
...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
...     )
... )
>>> try:
...     with fc:
...         raise RuntimeError("boom")
... except RuntimeError as err:
...     print(err)
boom

Source code in src/pyramids/feature/collection.py

def __enter__(self) -> FeatureCollection:
    """Enter a context-managed block. Returns `self`.

    Returns:
        FeatureCollection: `self` — the exact same instance, so
        `with... as fc:` binds `fc` to this collection.

    Examples:
        - Use as a context manager and access rows inside the block:
            ```python
            >>> import geopandas as gpd
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> gdf = gpd.GeoDataFrame(
            ...     {"id": [1, 2]},
            ...     geometry=[Point(0, 0), Point(1, 1)],
            ...     crs="EPSG:4326",
            ... )
            >>> with FeatureCollection(gdf) as fc:
            ...     n = len(fc)
            >>> n
            2

            ```
        - Exceptions raised inside the block still propagate:
            ```python
            >>> import geopandas as gpd
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> fc = FeatureCollection(
            ...     gpd.GeoDataFrame(
            ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
            ...     )
            ... )
            >>> try:
            ...     with fc:
            ...         raise RuntimeError("boom")
            ... except RuntimeError as err:
            ...     print(err)
            boom

            ```
    """
    return self

`exit(exc_type, exc, tb)` #

Exit the context-managed block. Calls :meth:close.

Parameters:

Name	Description	Default
`exc_type`	Exception class if the block raised, else `None`.	required
`exc`	Exception instance if the block raised, else `None`.	required
`tb`	Traceback for the raised exception, else `None`.	required

Returns:

Name	Type	Description
`bool`	`bool`	Always `False` — exceptions from inside the `with`
	`bool`	block propagate to the caller rather than being swallowed.

Examples:

The clean-exit path returns False so nothing is swallowed:

>>> import geopandas as gpd
>>> from shapely.geometry import Point
>>> from pyramids.feature import FeatureCollection
>>> fc = FeatureCollection(
...     gpd.GeoDataFrame(
...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
...     )
... )
>>> fc.__exit__(None, None, None)
False

A with block that finishes normally just releases the FC:

>>> import geopandas as gpd
>>> from shapely.geometry import Point
>>> from pyramids.feature import FeatureCollection
>>> gdf = gpd.GeoDataFrame(
...     {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
... )
>>> with FeatureCollection(gdf) as fc:
...     pass
>>> len(fc)
1

Source code in src/pyramids/feature/collection.py

def __exit__(self, exc_type, exc, tb) -> bool:
    """Exit the context-managed block. Calls :meth:`close`.

    Args:
        exc_type: Exception class if the block raised, else `None`.
        exc: Exception instance if the block raised, else `None`.
        tb: Traceback for the raised exception, else `None`.

    Returns:
        bool: Always `False` — exceptions from inside the `with`
        block propagate to the caller rather than being swallowed.

    Examples:
        - The clean-exit path returns `False` so nothing is swallowed:
            ```python
            >>> import geopandas as gpd
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> fc = FeatureCollection(
            ...     gpd.GeoDataFrame(
            ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
            ...     )
            ... )
            >>> fc.__exit__(None, None, None)
            False

            ```
        - A `with` block that finishes normally just releases the FC:
            ```python
            >>> import geopandas as gpd
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> gdf = gpd.GeoDataFrame(
            ...     {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
            ... )
            >>> with FeatureCollection(gdf) as fc:
            ...     pass
            >>> len(fc)
            1

            ```
    """
    self.close()
    return False

`close()` #

Release resources held by this FeatureCollection.

No-op today (the OGR bridge is self-cleaning). Exists so future resource-holding features have an idiomatic release point.

Returns:

Name	Type	Description
`None`	`None`	This method does not return a value.

Examples:

close() is idempotent — calling it repeatedly is safe:

>>> import geopandas as gpd
>>> from shapely.geometry import Point
>>> from pyramids.feature import FeatureCollection
>>> fc = FeatureCollection(
...     gpd.GeoDataFrame(
...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
...     )
... )
>>> fc.close()
>>> fc.close()
>>> len(fc)
1

The collection remains usable after close (no-op today):

>>> import geopandas as gpd
>>> from shapely.geometry import Point
>>> from pyramids.feature import FeatureCollection
>>> fc = FeatureCollection(
...     gpd.GeoDataFrame(
...         {"v": [7]}, geometry=[Point(2, 3)], crs="EPSG:4326",
...     )
... )
>>> fc.close()
>>> fc.epsg
4326

Source code in src/pyramids/feature/collection.py

def close(self) -> None:
    """Release resources held by this FeatureCollection.

    No-op today (the OGR bridge is self-cleaning). Exists so future
    resource-holding features have an idiomatic release point.

    Returns:
        None: This method does not return a value.

    Examples:
        - `close()` is idempotent — calling it repeatedly is safe:
            ```python
            >>> import geopandas as gpd
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> fc = FeatureCollection(
            ...     gpd.GeoDataFrame(
            ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
            ...     )
            ... )
            >>> fc.close()
            >>> fc.close()
            >>> len(fc)
            1

            ```
        - The collection remains usable after `close` (no-op today):
            ```python
            >>> import geopandas as gpd
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> fc = FeatureCollection(
            ...     gpd.GeoDataFrame(
            ...         {"v": [7]}, geometry=[Point(2, 3)], crs="EPSG:4326",
            ...     )
            ... )
            >>> fc.close()
            >>> fc.epsg
            4326

            ```
    """
    return None

`from_features(features, *, crs=None, columns=None)` `classmethod` #

Build a FeatureCollection from feature-shaped inputs.

Delegates to :meth:geopandas.GeoDataFrame.from_features and wraps the result. Accepts any of the shapes that method accepts:

a list (or iterator) of GeoJSON feature dicts of the form {"type": "Feature", "geometry": {...}, "properties": {...}},
any object exposing __geo_interface__ (shapely geometries, fiona records, custom feature classes), or
a bare FeatureCollection dict ({"type": "FeatureCollection", "features": [...]}).

Parameters:

Name	Type	Description	Default
`features`	`Iterable`	Feature dicts of the form `{"type": "Feature", "geometry": {...}, "properties": {...}}`, or any `__geo_interface__` provider. Also accepts a bare `FeatureCollection` dict.	required
`crs`	`Any`	CRS to attach to the result (EPSG int, `"EPSG:4326"`, WKT, Proj, or a :class:`pyproj.CRS`). `None` leaves the CRS unset.	`None`
`columns`	`list[str] \| None`	Explicit column order for the output. When `None`, geopandas infers columns from the first feature.	`None`

Returns:

Name	Type	Description
`FeatureCollection`	`FeatureCollection`	A new FC backed by the supplied features.

Raises:

Type	Description
`ValueError`	If `features` is empty or exhausted before any feature is consumed. An empty GeoDataFrame from `from_features` has no `geometry` column, which breaks downstream pyramids methods that assume the column exists. Fail fast instead.

Examples:

Build from a list of feature dicts:

>>> from pyramids.feature import FeatureCollection
>>> feats = [
...     {"type": "Feature",
...      "geometry": {"type": "Point", "coordinates": [0, 0]},
...      "properties": {"name": "a"}},
...     {"type": "Feature",
...      "geometry": {"type": "Point", "coordinates": [1, 1]},
...      "properties": {"name": "b"}},
... ]
>>> fc = FeatureCollection.from_features(feats, crs=4326)
>>> len(fc)
2
>>> fc.epsg
4326

Source code in src/pyramids/feature/collection.py

@classmethod
def from_features(
    cls,
    features: Iterable[Any],
    *,
    crs: Any = None,
    columns: list[str] | None = None,
) -> FeatureCollection:
    """Build a FeatureCollection from feature-shaped inputs.

    Delegates to :meth:`geopandas.GeoDataFrame.from_features` and
    wraps the result. Accepts any of the shapes that method
    accepts:

    * a list (or iterator) of GeoJSON feature dicts of the form
      `{"type": "Feature", "geometry": {...}, "properties": {...}}`,
    * any object exposing `__geo_interface__` (shapely
      geometries, fiona records, custom feature classes), or
    * a bare `FeatureCollection` dict (`{"type":
      "FeatureCollection", "features": [...]}`).

    Args:
        features (Iterable):
            Feature dicts of the form
            `{"type": "Feature", "geometry": {...}, "properties": {...}}`,
            or any `__geo_interface__` provider. Also accepts a
            bare `FeatureCollection` dict.
        crs:
            CRS to attach to the result (EPSG int, `"EPSG:4326"`,
            WKT, Proj, or a :class:`pyproj.CRS`). `None` leaves
            the CRS unset.
        columns (list[str] | None):
            Explicit column order for the output. When `None`,
            geopandas infers columns from the first feature.

    Returns:
        FeatureCollection: A new FC backed by the supplied features.

    Raises:
        ValueError: If `features` is empty or exhausted before any
            feature is consumed. An empty GeoDataFrame from
            `from_features` has no `geometry` column, which
            breaks downstream pyramids methods that assume the
            column exists. Fail fast instead.

    Examples:
        - Build from a list of feature dicts:
            ```python
            >>> from pyramids.feature import FeatureCollection
            >>> feats = [
            ...     {"type": "Feature",
            ...      "geometry": {"type": "Point", "coordinates": [0, 0]},
            ...      "properties": {"name": "a"}},
            ...     {"type": "Feature",
            ...      "geometry": {"type": "Point", "coordinates": [1, 1]},
            ...      "properties": {"name": "b"}},
            ... ]
            >>> fc = FeatureCollection.from_features(feats, crs=4326)
            >>> len(fc)
            2
            >>> fc.epsg
            4326

            ```
    """
    # materialise an iterator so we can detect the empty case
    # before handing off to geopandas. `geopandas.from_features([])`
    # returns a GeoDataFrame with no `geometry` column, which
    # breaks every pyramids op that assumes the column exists.
    features_list = list(features)
    if not features_list:
        raise ValueError(
            "from_features requires at least one feature. An empty "
            "iterable would produce a GeoDataFrame with no geometry "
            "column, which breaks downstream pyramids methods."
        )
    gdf = gpd.GeoDataFrame.from_features(features_list, crs=crs, columns=columns)
    return cls(gdf)

`from_bbox(bbox, *, epsg)` `classmethod` #

Build a one-row FeatureCollection from a geographic bounding box.

The bbox is the canonical (west, south, east, north) quadruple in the CRS named by epsg. The result is a single-row FC whose only geometry is a rectangular Polygon — handy for cropping a raster or windowed-reading it without writing out the polygon vertices by hand:

.. code-block:: python

mask = FeatureCollection.from_bbox((31.0, 30.0, 31.1, 30.1), epsg=4326)
cropped = dataset.crop(mask)

Most callers do not need to build this themselves — :meth:Dataset.crop and :meth:Dataset.read_array (via :meth:pyramids.dataset.engines.io.IO.read_array) accept the bbox/epsg pair directly and call this helper internally.

Parameters:

Name	Type	Description	Default
`bbox`	`tuple[float, float, float, float] \| list[float]`	A 4-element `(west, south, east, north)` tuple / list of numbers. Must satisfy `west < east` and `south < north`.	required
`epsg`	`Any`	CRS for the bbox coordinates — anything `geopandas` accepts for `crs=` (EPSG int such as `4326`, `"EPSG:4326"` string, WKT, Proj, or a :class:`pyproj.CRS`). Required (a bbox without a CRS is ambiguous).	required

Returns:

Name	Type	Description
`FeatureCollection`	`FeatureCollection`	A one-row FC carrying the rectangular polygon,
	`FeatureCollection`	in the supplied CRS.

Raises:

Type	Description
`ValueError`	`bbox` is not a 4-element sequence, or violates `west < east` / `south < north`, or `epsg` is `None`.
`TypeError`	`bbox` elements are not numbers.

Examples:

Build a one-row FC from a bbox and inspect it:

>>> from pyramids.feature import FeatureCollection
>>> fc = FeatureCollection.from_bbox((31.0, 30.0, 31.1, 30.1), epsg=4326)
>>> len(fc)
1
>>> tuple(float(v) for v in fc.total_bounds)
(31.0, 30.0, 31.1, 30.1)
>>> fc.crs.to_epsg()
4326

Use it as a mask to crop a raster:

>>> import numpy as np
>>> from pyramids.dataset import Dataset
>>> from pyramids.feature import FeatureCollection
>>> arr = np.arange(100, dtype="int16").reshape(10, 10)
>>> ds = Dataset.create_from_array(
...     arr, top_left_corner=(0, 0), cell_size=0.05, epsg=4326,
... )
>>> fc = FeatureCollection.from_bbox((0.1, -0.2, 0.2, -0.1), epsg=4326)
>>> ds.crop(mask=fc).shape
(1, 2, 2)

epsg=None is rejected — a bbox without a CRS is ambiguous:

>>> from pyramids.feature import FeatureCollection
>>> try:
...     FeatureCollection.from_bbox((0, 0, 1, 1), epsg=None)
... except ValueError as exc:
...     print("epsg" in str(exc))
True

`from_records(records, *, geometry='geometry', crs=None, orient='records')` `classmethod` #

Build a FeatureCollection from dict records.

Two input orientations are accepted (C26 added the second):

orient="records" (default) — an iterable of per-row dicts, each of the form {column: value,..., geometry: <shapely>}. The dict's keys become column names; the key named by geometry must hold a shapely geometry.
orient="list" — a single columnar dict mapping each column name to a list of values of equal length, for example {"id": [1, 2], "geometry": [pt_a, pt_b]}.

Useful for ingesting rows from an API response that doesn't emit GeoJSON but already has shapely geoms.

Parameters:

Name	Type	Description	Default
`records`	`Any`	Per-row iterable of dicts when `orient="records"`, or a single columnar dict when `orient="list"`.	required
`geometry`	`str`	Name of the column / key holding the shapely geometry. Default `"geometry"`.	`'geometry'`
`crs`	`Any`	CRS to attach (same forms as :meth:`from_features`).	`None`
`orient`	`str`	`"records"` (default) or `"list"` — matches the pandas `from_dict`/`from_records` conventions.	`'records'`

Returns:

Name	Type	Description
`FeatureCollection`	`FeatureCollection`	A new FC with one row per record.

Raises:

Type	Description
`FeatureError`	If a record is missing the `geometry` column.
`ValueError`	If `orient` is not one of the supported values.

Examples:

Per-row records with the default geometry key:

>>> from shapely.geometry import Point
>>> from pyramids.feature import FeatureCollection
>>> recs = [
...     {"id": 1, "geometry": Point(0, 0)},
...     {"id": 2, "geometry": Point(1, 1)},
... ]
>>> fc = FeatureCollection.from_records(recs, crs=4326)
>>> len(fc)
2
>>> fc.epsg
4326

Custom geometry key via the geometry= kwarg:

>>> from shapely.geometry import Point
>>> from pyramids.feature import FeatureCollection
>>> recs = [
...     {"id": 1, "geom": Point(0, 0)},
...     {"id": 2, "geom": Point(1, 1)},
... ]
>>> fc = FeatureCollection.from_records(
...     recs, geometry="geom", crs=4326,
... )
>>> fc.geometry.name
'geom'

Columnar dict via orient="list":

>>> from shapely.geometry import Point
>>> from pyramids.feature import FeatureCollection
>>> cols = {"id": [1, 2], "geometry": [Point(0, 0), Point(1, 1)]}
>>> fc = FeatureCollection.from_records(
...     cols, orient="list", crs=4326,
... )
>>> list(fc["id"])
[1, 2]

Source code in src/pyramids/feature/collection.py

@classmethod
def from_records(
    cls,
    records: Any,
    *,
    geometry: str = "geometry",
    crs: Any = None,
    orient: str = "records",
) -> FeatureCollection:
    """Build a FeatureCollection from dict records.

    Two input orientations are accepted (C26 added the second):

    * `orient="records"` (default) — an iterable of per-row dicts,
      each of the form `{column: value,..., geometry: <shapely>}`.
      The dict's keys become column names; the key named by
      `geometry` must hold a shapely geometry.
    * `orient="list"` — a single columnar dict mapping each
      column name to a list of values of equal length, for
      example `{"id": [1, 2], "geometry": [pt_a, pt_b]}`.

    Useful for ingesting rows from an API response that doesn't
    emit GeoJSON but already has shapely geoms.

    Args:
        records:
            Per-row iterable of dicts when `orient="records"`, or a
            single columnar dict when `orient="list"`.
        geometry (str):
            Name of the column / key holding the shapely geometry.
            Default `"geometry"`.
        crs:
            CRS to attach (same forms as :meth:`from_features`).
        orient (str):
            `"records"` (default) or `"list"` — matches the
            pandas `from_dict`/`from_records` conventions.

    Returns:
        FeatureCollection: A new FC with one row per record.

    Raises:
        FeatureError: If a record is missing the `geometry`
            column.
        ValueError: If `orient` is not one of the supported
            values.

    Examples:
        - Per-row records with the default geometry key:
            ```python
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> recs = [
            ...     {"id": 1, "geometry": Point(0, 0)},
            ...     {"id": 2, "geometry": Point(1, 1)},
            ... ]
            >>> fc = FeatureCollection.from_records(recs, crs=4326)
            >>> len(fc)
            2
            >>> fc.epsg
            4326

            ```
        - Custom geometry key via the `geometry=` kwarg:
            ```python
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> recs = [
            ...     {"id": 1, "geom": Point(0, 0)},
            ...     {"id": 2, "geom": Point(1, 1)},
            ... ]
            >>> fc = FeatureCollection.from_records(
            ...     recs, geometry="geom", crs=4326,
            ... )
            >>> fc.geometry.name
            'geom'

            ```
        - Columnar dict via `orient="list"`:
            ```python
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> cols = {"id": [1, 2], "geometry": [Point(0, 0), Point(1, 1)]}
            >>> fc = FeatureCollection.from_records(
            ...     cols, orient="list", crs=4326,
            ... )
            >>> list(fc["id"])
            [1, 2]

            ```
    """

    # empty-input branches both build a single-column frame
    # whose column name matches the `geometry=` kwarg, so
    # `GeoDataFrame(..., geometry=…)` sets it as the active
    # geometry column and the returned FC has
    # `geometry.name == geometry`.
    def _empty_fc() -> FeatureCollection:
        return cls(gpd.GeoDataFrame({geometry: []}, geometry=geometry, crs=crs))

    if orient == "records":
        records_list = list(records)
        if not records_list:
            return _empty_fc()
        df = pd.DataFrame.from_records(records_list)
    elif orient == "list":
        # columnar dict of equal-length lists. Straight into
        # `pd.DataFrame` which accepts this shape natively and
        # raises `ValueError` on mismatched lengths (propagated
        # to the caller as-is — the pandas message is already clear).
        if not isinstance(records, dict):
            raise ValueError(
                f"orient='list' expects a dict of column → list; "
                f"got {type(records).__name__}."
            )
        df = pd.DataFrame(records)
        if len(df) == 0:
            return _empty_fc()
    else:
        raise ValueError(f"orient must be 'records' or 'list'; got {orient!r}.")
    if geometry not in df.columns:
        raise FeatureError(
            f"records missing required geometry column {geometry!r}; "
            f"columns present: {list(df.columns)}"
        )
    return cls(gpd.GeoDataFrame(df, geometry=geometry, crs=crs))

`iter_features(path, *, layer=None, bbox=None, where=None, chunksize=None, tile_strategy='auto', include_index=False)` `classmethod` #

Stream features from path without materializing the full file.

. Two orthogonal knobs:

Chunk shape. chunksize=None yields one GeoJSON-style dict per row (fiona idiom). chunksize=N yields :class:FeatureCollection batches of up to N rows each so batched pipelines get a DataFrame-shaped payload.
Tile strategy. Controls whether the bbox filter is pushed into the format's spatial index (rtree on GPKG, row-group statistics on Parquet, …) or applied after a full scan. Pass one of:
"auto" (default) — let pyogrio pick. For a GPKG, pyogrio queries the rtree_<layer>_geom companion table automatically. For a Parquet file, pyogrio / pyarrow push the bbox down to the row-group statistics and skip non-matching groups. For formats without a spatial index (GeoJSON, Shapefile without a .qix) this falls back to a full scan in the driver.
"rtree" — same as "auto"; kept as an explicit name so pipeline code can document intent.
"row_group" — same as "auto"; explicit name for the Parquet case.
"none" — disable index pushdown; read whole chunks from the driver and apply the bbox filter in Python. Useful when the on-disk spatial index is stale or suspected wrong; also exercises the "slow path" in tests.

bbox / where compose with any tile_strategy. Paths run through :func:pyramids._io._parse_path so cloud URLs and archive paths work the same way as in :meth:read_file.

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	File path, URL, archive path.	required
`layer`	`str \| int \| None`	Layer selector for multi-layer formats.	`None`
`bbox`	`tuple[float, float, float, float] \| None`	`(minx, miny, maxx, maxy)` filter.	`None`
`where`	`str \| None`	OGR SQL predicate.	`None`
`chunksize`	`int \| None`	`None` yields dicts, an `int` yields `FeatureCollection` chunks.	`None`
`tile_strategy`	`str`	One of `"auto"`, `"rtree"`, `"row_group"`, `"none"`. Default `"auto"`.	`'auto'`
`include_index`	`bool`	When `True`, each yielded dict gets an additional `"id"` key whose value is the 0-based file-row index of that feature. The chunked form (`chunksize=N`) attaches the same index as a `"_row_index"` column on the yielded FC. The indices stay aligned with the on-disk rows even when a Python-side bbox filter (`tile_strategy="none"`) drops some rows — only the surviving features are yielded, and their ids match the positions they had in the source file. Defaults to `False` for back-compat with the fiona idiom.	`False`

Yields:

Type	Description
`Any`	dict \| FeatureCollection: Per-feature dicts when
`Any`	`chunksize` is `None`; FeatureCollection chunks
`Any`	otherwise.

Raises:

Type	Description
`ValueError`	If `chunksize` is given but `< 1`, or if `tile_strategy` is not one of the accepted values.

Examples:

Stream features one at a time as GeoJSON-style dicts:

>>> import tempfile
>>> from pathlib import Path
>>> import geopandas as gpd
>>> from shapely.geometry import Point
>>> from pyramids.feature import FeatureCollection
>>> d = Path(tempfile.mkdtemp())
>>> path = d / "pts.geojson"
>>> gdf = gpd.GeoDataFrame(
...     {"id": [1, 2, 3]},
...     geometry=[Point(0, 0), Point(1, 1), Point(2, 2)],
...     crs="EPSG:4326",
... )
>>> gdf.to_file(path, driver="GeoJSON")
>>> feats = list(FeatureCollection.iter_features(path))
>>> len(feats)
3
>>> feats[0]["properties"]["id"]
1

Stream in chunksize=2 batches as FeatureCollection chunks:

>>> import tempfile
>>> from pathlib import Path
>>> import geopandas as gpd
>>> from shapely.geometry import Point
>>> from pyramids.feature import FeatureCollection
>>> d = Path(tempfile.mkdtemp())
>>> path = d / "pts.geojson"
>>> gdf = gpd.GeoDataFrame(
...     {"id": [1, 2, 3]},
...     geometry=[Point(0, 0), Point(1, 1), Point(2, 2)],
...     crs="EPSG:4326",
... )
>>> gdf.to_file(path, driver="GeoJSON")
>>> chunks = list(
...     FeatureCollection.iter_features(path, chunksize=2)
... )
>>> [len(c) for c in chunks]
[2, 1]

Invalid chunksize raises ValueError:

>>> from pyramids.feature import FeatureCollection
>>> gen = FeatureCollection.iter_features("anywhere", chunksize=0)
>>> next(gen)
Traceback (most recent call last):
    ...
ValueError: chunksize must be >= 1 when supplied; got 0.

Source code in src/pyramids/feature/collection.py

@classmethod
def iter_features(
    cls,
    path: str | Path,
    *,
    layer: str | int | None = None,
    bbox: tuple[float, float, float, float] | None = None,
    where: str | None = None,
    chunksize: int | None = None,
    tile_strategy: str = "auto",
    include_index: bool = False,
) -> Any:
    """Stream features from `path` without materializing the full file.

    . Two orthogonal knobs:

    * **Chunk shape**. `chunksize=None` yields one GeoJSON-style
      dict per row (fiona idiom). `chunksize=N` yields
      :class:`FeatureCollection` batches of up to N rows each so
      batched pipelines get a DataFrame-shaped payload.
    * **Tile strategy**. Controls whether the `bbox`
      filter is pushed into the format's spatial index (rtree on
      GPKG, row-group statistics on Parquet, …) or applied after
      a full scan. Pass one of:

      - `"auto"` (default) — let pyogrio pick. For a GPKG,
        pyogrio queries the `rtree_<layer>_geom` companion
        table automatically. For a Parquet file, pyogrio /
        pyarrow push the bbox down to the row-group statistics
        and skip non-matching groups. For formats without a
        spatial index (GeoJSON, Shapefile without a `.qix`)
        this falls back to a full scan in the driver.
      - `"rtree"` — same as `"auto"`; kept as an explicit
        name so pipeline code can document intent.
      - `"row_group"` — same as `"auto"`; explicit name for
        the Parquet case.
      - `"none"` — disable index pushdown; read whole chunks
        from the driver and apply the bbox filter in Python.
        Useful when the on-disk spatial index is stale or
        suspected wrong; also exercises the "slow path" in
        tests.

    `bbox` / `where` compose with any tile_strategy. Paths run
    through :func:`pyramids._io._parse_path` so cloud URLs and
    archive paths work the same way as in :meth:`read_file`.

    Args:
        path (str | Path): File path, URL, archive path.
        layer (str | int | None): Layer selector for multi-layer
            formats.
        bbox: `(minx, miny, maxx, maxy)` filter.
        where (str | None): OGR SQL predicate.
        chunksize (int | None): `None` yields dicts, an `int`
            yields `FeatureCollection` chunks.
        tile_strategy (str): One of `"auto"`, `"rtree"`,
            `"row_group"`, `"none"`. Default `"auto"`.
        include_index (bool): When `True`, each yielded dict gets
            an additional `"id"` key whose value is the
            0-based file-row index of that feature. The chunked
            form (`chunksize=N`) attaches the same index as a
            `"_row_index"` column on the yielded FC. The indices
            stay aligned with the on-disk rows even when a
            Python-side bbox filter (`tile_strategy="none"`)
            drops some rows — only the surviving features are
            yielded, and their ids match the positions they had
            in the source file. Defaults to `False` for
            back-compat with the fiona idiom.

    Yields:
        dict | FeatureCollection: Per-feature dicts when
        `chunksize` is `None`; FeatureCollection chunks
        otherwise.

    Raises:
        ValueError: If `chunksize` is given but `< 1`, or if
            `tile_strategy` is not one of the accepted values.

    Examples:
        - Stream features one at a time as GeoJSON-style dicts:
            ```python
            >>> import tempfile
            >>> from pathlib import Path
            >>> import geopandas as gpd
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> d = Path(tempfile.mkdtemp())
            >>> path = d / "pts.geojson"
            >>> gdf = gpd.GeoDataFrame(
            ...     {"id": [1, 2, 3]},
            ...     geometry=[Point(0, 0), Point(1, 1), Point(2, 2)],
            ...     crs="EPSG:4326",
            ... )
            >>> gdf.to_file(path, driver="GeoJSON")
            >>> feats = list(FeatureCollection.iter_features(path))
            >>> len(feats)
            3
            >>> feats[0]["properties"]["id"]
            1

            ```
        - Stream in `chunksize=2` batches as FeatureCollection chunks:
            ```python
            >>> import tempfile
            >>> from pathlib import Path
            >>> import geopandas as gpd
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> d = Path(tempfile.mkdtemp())
            >>> path = d / "pts.geojson"
            >>> gdf = gpd.GeoDataFrame(
            ...     {"id": [1, 2, 3]},
            ...     geometry=[Point(0, 0), Point(1, 1), Point(2, 2)],
            ...     crs="EPSG:4326",
            ... )
            >>> gdf.to_file(path, driver="GeoJSON")
            >>> chunks = list(
            ...     FeatureCollection.iter_features(path, chunksize=2)
            ... )
            >>> [len(c) for c in chunks]
            [2, 1]

            ```
        - Invalid `chunksize` raises `ValueError`:
            ```python
            >>> from pyramids.feature import FeatureCollection
            >>> gen = FeatureCollection.iter_features("anywhere", chunksize=0)
            >>> next(gen)
            Traceback (most recent call last):
                ...
            ValueError: chunksize must be >= 1 when supplied; got 0.

            ```
    """
    if chunksize is not None and chunksize < 1:
        raise ValueError(f"chunksize must be >= 1 when supplied; got {chunksize}.")
    if tile_strategy not in cls._VALID_TILE_STRATEGIES:
        raise ValueError(
            f"tile_strategy must be one of "
            f"{cls._VALID_TILE_STRATEGIES}; got {tile_strategy!r}."
        )

    import pyogrio

    resolved = str(_pyramids_io._parse_path(path))

    # Determine how many features are in the layer so we can
    # iterate in fixed-size batches via skip_features / max_features.
    # pyogrio's read_info is O(1) per call.
    info_kwargs: dict[str, Any] = {}
    if layer is not None:
        info_kwargs["layer"] = layer
    info = pyogrio.read_info(resolved, **info_kwargs)
    total = int(info["features"])

    if chunksize is None:
        batch_size = _DEFAULT_ITER_BATCH_SIZE
    else:
        batch_size = int(chunksize)

    # D-M3: pin the engine to pyogrio. `skip_features` /
    # `max_features` are pyogrio-specific (geopandas' fiona
    # engine silently ignores them, which would turn every chunk
    # into a full scan). Pinning the engine makes the contract
    # explicit and fails fast if pyogrio is absent.
    read_kwargs: dict[str, Any] = {"engine": "pyogrio"}
    if layer is not None:
        read_kwargs["layer"] = layer
    if where is not None:
        read_kwargs["where"] = where

    # when tile_strategy is "auto"/"rtree"/"row_group",
    # forward the bbox to pyogrio which transparently uses the
    # format's spatial index. When "none", hold the bbox back
    # and apply it in Python after each chunk loads.
    pushdown_bbox = bbox if tile_strategy != "none" else None
    python_bbox = bbox if tile_strategy == "none" else None
    if pushdown_bbox is not None:
        read_kwargs["bbox"] = pushdown_bbox

    for start in range(0, total, batch_size):
        gdf_chunk = gpd.read_file(
            resolved,
            skip_features=start,
            max_features=batch_size,
            **read_kwargs,
        )
        # remember the absolute row indices before any
        # bbox-based masking so callers can map yielded features
        # back to their source rows even after a Python-side filter
        # has dropped some of them.
        if include_index:
            row_indices = list(range(start, start + len(gdf_chunk)))
        if python_bbox is not None and len(gdf_chunk) > 0:
            xmin, ymin, xmax, ymax = python_bbox
            mask = gdf_chunk.intersects(box(xmin, ymin, xmax, ymax))
            if include_index:
                row_indices = [ri for ri, keep in zip(row_indices, mask) if keep]
            gdf_chunk = gdf_chunk[mask]
        if chunksize is None:
            iterator = gdf_chunk.iterfeatures(na="null")
            if include_index:
                for ri, feat in zip(row_indices, iterator):
                    feat["id"] = ri
                    yield feat
            else:
                for feat in iterator:
                    yield feat
        else:
            chunk_fc = cls(gdf_chunk)
            if include_index:
                chunk_fc["_row_index"] = row_indices
            yield chunk_fc

`read_file(path, *, layer=None, bbox=None, mask=None, rows=None, columns=None, where=None, backend='pandas', npartitions=None, chunksize=None, **kwargs)` `classmethod` #

Read a vector file into a FeatureCollection.

path is first routed through :func:pyramids._io._parse_path, which handles:

Cloud-URL rewriting (s3://, gs://, az://, abfs://, http(s)://, file:// → GDAL /vsi*/ form). verified end-to-end through an HTTP test. For AWS / GCS / Azure credentials either set the standard environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, GOOGLE_APPLICATION_CREDENTIALS, AZURE_STORAGE_CONNECTION_STRING, …) or scope them via :class:pyramids.base.remote.CloudConfig as a context manager around the read_file call.
Compressed-archive dispatch for .zip, .tar, .tar.gz, .gz on local paths — the returned path is a /vsizip/, /vsitar/ or /vsigzip/ string that :func:geopandas.read_file (via GDAL's virtual filesystem) can open directly. You can either pass just the archive path (first contained file wins) or archive.zip/inner.geojson to target a specific member. Cloud + archive chaining (http://host/x.zip) is not automatic today — if you need it, stage the archive locally first or use CloudConfig with an explicit /vsizip//vsicurl/... path.

filter kwargs are pushed down to fiona/pyogrio so the dataset never fully materializes when only a subset is needed.

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	File path, URL, archive path, or `archive.ext/inner-file` form.	required
`layer`	`str \| int \| None`	Layer name or index for multi-layer formats (GeoPackage, GDB, KML, …). `None` reads the first / default layer.	`None`
`bbox`	`tuple[float, float, float, float] \| Any`	`(minx, miny, maxx, maxy)` tuple, or a `GeoDataFrame` / `GeoSeries` / shapely geometry whose total bounds are used. Only features intersecting the bbox are loaded.	`None`
`mask`	`Any`	A shapely geometry (or mapping / GeoSeries / GeoDataFrame) whose geometries are used as a mask — only features intersecting the mask are loaded. Finer than `bbox` (actual geometry intersection, not just envelope). Mutually exclusive with `bbox`.	`None`
`rows`	`slice \| int \| None`	`int` — read at most N rows. `slice` — read the given range of rows. Useful for sampling.	`None`
`columns`	`list[str] \| None`	Restrict loaded attribute columns. Geometry is always loaded. `None` loads every column.	`None`
`where`	`str \| None`	OGR SQL `WHERE`-clause predicate pushed down to the driver (e.g. `"population > 10000"`). Avoids loading non-matching features.	`None`
`**kwargs`	`Any`	Forwarded to :func:`geopandas.read_file` verbatim for engine-specific options (`engine="pyogrio"`, `use_arrow=True`, driver-specific creation options).	`{}`

Returns:

Name	Type	Description
`FeatureCollection`	`FeatureCollection \| LazyFeatureCollection`	The (possibly filtered) features
	`FeatureCollection \| LazyFeatureCollection`	wrapped as a FeatureCollection.

Examples:

Load a GeoJSON file:

>>> from pyramids.feature import FeatureCollection
>>> fc = FeatureCollection.read_file("tests/data/coello-gauges.geojson")
>>> len(fc) > 0
True

Source code in src/pyramids/feature/collection.py

@classmethod
def read_file(
    cls,
    path: str | Path,
    *,
    layer: str | int | None = None,
    bbox: tuple[float, float, float, float] | Any = None,
    mask: Any = None,
    rows: slice | int | None = None,
    columns: list[str] | None = None,
    where: str | None = None,
    backend: str = "pandas",
    npartitions: int | None = None,
    chunksize: int | None = None,
    **kwargs: Any,
) -> FeatureCollection | LazyFeatureCollection:
    """Read a vector file into a FeatureCollection.

    path is first routed through
    :func:`pyramids._io._parse_path`, which handles:

    * Cloud-URL rewriting (`s3://`, `gs://`, `az://`,
      `abfs://`, `http(s)://`, `file://` → GDAL `/vsi*/`
      form). verified end-to-end through an HTTP test.
      For AWS / GCS / Azure credentials either set the standard
      environment variables (`AWS_ACCESS_KEY_ID`,
      `AWS_SECRET_ACCESS_KEY`, `GOOGLE_APPLICATION_CREDENTIALS`,
      `AZURE_STORAGE_CONNECTION_STRING`, …) or scope them via
      :class:`pyramids.base.remote.CloudConfig` as a context
      manager around the `read_file` call.
    * Compressed-archive dispatch for `.zip`, `.tar`, `.tar.gz`,
      `.gz` on **local** paths — the returned path is a
      `/vsizip/`, `/vsitar/` or `/vsigzip/` string that
      :func:`geopandas.read_file` (via GDAL's virtual filesystem)
      can open directly. You can either pass just the archive
      path (first contained file wins) or
      `archive.zip/inner.geojson` to target a specific member.
      Cloud + archive chaining (`http://host/x.zip`) is not
      automatic today — if you need it, stage the archive
      locally first or use `CloudConfig` with an explicit
      `/vsizip//vsicurl/...` path.

    filter kwargs are pushed down to fiona/pyogrio so the
    dataset never fully materializes when only a subset is needed.

    Args:
        path (str | Path):
            File path, URL, archive path, or
            `archive.ext/inner-file` form.
        layer (str | int | None):
            Layer name or index for multi-layer formats
            (GeoPackage, GDB, KML, …). `None` reads the first /
            default layer.
        bbox:
            `(minx, miny, maxx, maxy)` tuple, or a
            `GeoDataFrame` / `GeoSeries` / shapely geometry
            whose total bounds are used. Only features
            intersecting the bbox are loaded.
        mask:
            A shapely geometry (or mapping / GeoSeries /
            GeoDataFrame) whose geometries are used as a mask —
            only features intersecting the mask are loaded. Finer
            than `bbox` (actual geometry intersection, not just
            envelope). Mutually exclusive with `bbox`.
        rows (slice | int | None):
            `int` — read at most N rows. `slice` — read the
            given range of rows. Useful for sampling.
        columns (list[str] | None):
            Restrict loaded attribute columns. Geometry is
            always loaded. `None` loads every column.
        where (str | None):
            OGR SQL `WHERE`-clause predicate pushed down to the
            driver (e.g. `"population > 10000"`). Avoids loading
            non-matching features.
        **kwargs:
            Forwarded to :func:`geopandas.read_file` verbatim for
            engine-specific options (`engine="pyogrio"`,
            `use_arrow=True`, driver-specific creation options).

    Returns:
        FeatureCollection: The (possibly filtered) features
        wrapped as a FeatureCollection.

    Examples:
        - Load a GeoJSON file:
            ```python
            >>> from pyramids.feature import FeatureCollection
            >>> fc = FeatureCollection.read_file("tests/data/coello-gauges.geojson")
            >>> len(fc) > 0
            True

            ```
    """
    resolved = _pyramids_io._parse_path(path)
    if backend == "dask":
        # dask_geopandas.read_file does NOT forward pyogrio
        # filter kwargs (bbox / mask / rows / columns / where) —
        # silently dropping them was the bug. Raise a clear
        # ValueError instead so users know to either pre-filter
        # or call .compute() and filter eagerly.
        unsupported = {
            "bbox": bbox,
            "mask": mask,
            "rows": rows,
            "columns": columns,
            "where": where,
            "layer": layer,
        }
        supplied = [k for k, v in unsupported.items() if v is not None]
        if supplied:
            raise ValueError(
                f"backend='dask' does not support filter kwargs "
                f"{supplied}. dask_geopandas.read_file has no "
                "pushdown story for these. Either omit them and "
                "filter post-load via .clip / .loc / .compute, or "
                "switch to read_parquet(backend='dask', filters=...)"
            )
        try:
            import dask_geopandas
        except ImportError as exc:
            raise ImportError(
                "backend='dask' requires the optional "
                "'dask-geopandas' dependency. Install with one of:\n"
                "  - PyPI:        pip install 'pyramids-gis[parquet-lazy]'\n"
                "  - conda-forge: conda install -c conda-forge pyramids-parquet-lazy"
            ) from exc
        # default npartitions from file size when neither
        # kwarg was supplied; one-partition fallback defeats the
        # point of going lazy.
        partition_kwargs = _resolve_lazy_partitioning(
            resolved,
            npartitions,
            chunksize,
        )
        # wrap the lazy return as a LazyFeatureCollection so the
        # dask branch stays inside the pyramids type system.
        from pyramids.feature._lazy_collection import LazyFeatureCollection

        dask_gdf = dask_geopandas.read_file(resolved, **partition_kwargs)
        return LazyFeatureCollection.from_dask_gdf(dask_gdf)
    if backend != "pandas":
        raise ValueError(f"backend must be 'pandas' or 'dask', got {backend!r}")
    # Only pass kwargs that were actually supplied — passing the
    # defaults (None) is fine for some geopandas engines but
    # confuses others. Build a clean kwargs dict.
    passthrough: dict[str, Any] = {}
    if layer is not None:
        passthrough["layer"] = layer
    if bbox is not None:
        passthrough["bbox"] = bbox
    if mask is not None:
        passthrough["mask"] = mask
    if rows is not None:
        passthrough["rows"] = rows
    if columns is not None:
        passthrough["columns"] = columns
    if where is not None:
        passthrough["where"] = where
    passthrough.update(kwargs)
    gdf = gpd.read_file(resolved, **passthrough)
    return cls(gdf)

`str()` #

Return a short, pyramids-branded summary of the collection.

Source code in src/pyramids/feature/collection.py

def __str__(self) -> str:
    """Return a short, pyramids-branded summary of the collection."""
    n = len(self)
    cols = self.columns.tolist()
    epsg = self.epsg
    return f"FeatureCollection({n} features, " f"columns={cols}, epsg={epsg})"

`repr()` #

Return a pyramids-branded repr.

Source code in src/pyramids/feature/collection.py

def __repr__(self) -> str:
    """Return a pyramids-branded repr."""
    return (
        f"FeatureCollection(n_features={len(self)}, "
        f"columns={self.columns.tolist()}, epsg={self.epsg})"
    )

`list_layers(path)` `classmethod` #

List every vector-layer name in path.

Routes through :func:pyramids._io._parse_path so the same cloud-URL / archive rewriting that :meth:read_file uses applies here too. Uses :func:pyogrio.list_layers under the hood (geopandas' default engine).

results are memoised behind a 128-entry LRU cache keyed on the resolved str path. Re-calling list_layers on the same cloud URL or local path in a loop now costs one hash lookup instead of one datasource open. Call :meth:list_layers_cache_clear to invalidate after an out-of-band write.

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	File path, URL, or archive path. Single-layer formats like GeoJSON return one name; multi-layer formats (GPKG, GDB, KML) return every layer.	required

Returns:

Type	Description
`list[str]`	list[str]: Layer names in the order the driver reports them.

Raises:

Type	Description
`FileNotFoundError`	If `path` is a local filesystem path that does not exist. Cloud URLs and `/vsi*` paths skip this check and defer to the underlying driver . Previously all failures surfaced as an opaque `VectorDriverError("Failed to open datasource")`.

Examples:

A single-layer GeoJSON returns one name derived from the filename:

>>> import tempfile
>>> from pathlib import Path
>>> import geopandas as gpd
>>> from shapely.geometry import Point
>>> from pyramids.feature import FeatureCollection
>>> d = Path(tempfile.mkdtemp())
>>> path = d / "pts.geojson"
>>> gdf = gpd.GeoDataFrame(
...     {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
... )
>>> gdf.to_file(path, driver="GeoJSON")
>>> FeatureCollection.list_layers(path)
['pts']

A missing local path raises FileNotFoundError:

>>> from pyramids.feature import FeatureCollection
>>> FeatureCollection.list_layers("does/not/exist.geojson")
Traceback (most recent call last):
    ...
FileNotFoundError: list_layers: no file at 'does/not/exist.geojson'.

Source code in src/pyramids/feature/collection.py

@classmethod
def list_layers(cls, path: str | Path) -> list[str]:
    """List every vector-layer name in `path`.

    Routes through :func:`pyramids._io._parse_path` so the same
    cloud-URL / archive rewriting that :meth:`read_file` uses
    applies here too. Uses :func:`pyogrio.list_layers` under the
    hood (geopandas' default engine).

    results are memoised behind a 128-entry LRU cache keyed on
    the resolved `str` path. Re-calling `list_layers` on the
    same cloud URL or local path in a loop now costs one hash
    lookup instead of one datasource open. Call
    :meth:`list_layers_cache_clear` to invalidate after an
    out-of-band write.

    Args:
        path (str | Path):
            File path, URL, or archive path. Single-layer formats
            like GeoJSON return one name; multi-layer formats
            (GPKG, GDB, KML) return every layer.

    Returns:
        list[str]: Layer names in the order the driver reports them.

    Raises:
        FileNotFoundError: If `path` is a local filesystem path
            that does not exist. Cloud URLs and `/vsi*` paths
            skip this check and defer to the underlying driver
            . Previously all failures surfaced as an opaque
            `VectorDriverError("Failed to open datasource")`.

    Examples:
        - A single-layer GeoJSON returns one name derived from the filename:
            ```python
            >>> import tempfile
            >>> from pathlib import Path
            >>> import geopandas as gpd
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> d = Path(tempfile.mkdtemp())
            >>> path = d / "pts.geojson"
            >>> gdf = gpd.GeoDataFrame(
            ...     {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
            ... )
            >>> gdf.to_file(path, driver="GeoJSON")
            >>> FeatureCollection.list_layers(path)
            ['pts']

            ```
        - A missing local path raises `FileNotFoundError`:
            ```python
            >>> from pyramids.feature import FeatureCollection
            >>> FeatureCollection.list_layers("does/not/exist.geojson")
            Traceback (most recent call last):
                ...
            FileNotFoundError: list_layers: no file at 'does/not/exist.geojson'.

            ```
    """
    # pre-check local-path existence so the caller sees
    # a `FileNotFoundError` naming the path instead of a generic
    # driver-open failure. Defer to `base.remote.is_remote` as
    # the single source of truth for which schemes are remote —
    # the previous hardcoded prefix tuple would silently treat any
    # future scheme as local and raise a misleading error.
    path_str = str(path)
    if not is_remote(path_str):
        local = Path(path_str)
        if not local.exists():
            raise FileNotFoundError(f"list_layers: no file at {path_str!r}.")

    resolved = str(_pyramids_io._parse_path(path))
    return list(_list_layers_cached(resolved))

`list_layers_cache_clear()` `classmethod` #

Clear the C15 LRU cache backing :meth:list_layers.

Call this after writing a new layer to an existing multi-layer file (e.g. a GPKG) if you then want :meth:list_layers to see the new layer. Otherwise the 128-entry LRU cache is self- managing and callers do not need to touch it.

Returns:

Name	Type	Description
`None`	`None`	This method does not return a value.

Examples:

Clearing an empty cache is a safe no-op:

>>> from pyramids.feature import FeatureCollection
>>> FeatureCollection.list_layers_cache_clear()
>>> FeatureCollection.list_layers_cache_clear()

After an out-of-band write, clear the cache so the next list_layers call re-reads the updated file:

>>> import tempfile
>>> from pathlib import Path
>>> import geopandas as gpd
>>> from shapely.geometry import Point
>>> from pyramids.feature import FeatureCollection
>>> d = Path(tempfile.mkdtemp())
>>> path = d / "pts.geojson"
>>> gpd.GeoDataFrame(
...     {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
... ).to_file(path, driver="GeoJSON")
>>> _ = FeatureCollection.list_layers(path)
>>> FeatureCollection.list_layers_cache_clear()
>>> FeatureCollection.list_layers(path)
['pts']

Source code in src/pyramids/feature/collection.py

@classmethod
def list_layers_cache_clear(cls) -> None:
    """Clear the C15 LRU cache backing :meth:`list_layers`.

    Call this after writing a new layer to an existing multi-layer
    file (e.g. a GPKG) if you then want :meth:`list_layers` to see
    the new layer. Otherwise the 128-entry LRU cache is self-
    managing and callers do not need to touch it.

    Returns:
        None: This method does not return a value.

    Examples:
        - Clearing an empty cache is a safe no-op:
            ```python
            >>> from pyramids.feature import FeatureCollection
            >>> FeatureCollection.list_layers_cache_clear()
            >>> FeatureCollection.list_layers_cache_clear()

            ```
        - After an out-of-band write, clear the cache so the next
          `list_layers` call re-reads the updated file:
            ```python
            >>> import tempfile
            >>> from pathlib import Path
            >>> import geopandas as gpd
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> d = Path(tempfile.mkdtemp())
            >>> path = d / "pts.geojson"
            >>> gpd.GeoDataFrame(
            ...     {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
            ... ).to_file(path, driver="GeoJSON")
            >>> _ = FeatureCollection.list_layers(path)
            >>> FeatureCollection.list_layers_cache_clear()
            >>> FeatureCollection.list_layers(path)
            ['pts']

            ```
    """
    _list_layers_cached.cache_clear()

`open_arrow(path, *, layer=None, columns=None, bbox=None, where=None, batch_size=None)` `classmethod` #

Open a vector file as a streaming :class:pyarrow.RecordBatchReader.

Thin wrapper over :func:pyogrio.raw.open_arrow that surfaces the underlying Arrow RecordBatch iterator. Rows are yielded in batches, so callers can iterate through multi-GB datasets without materializing the whole table in memory — useful for building custom dask partitioners.

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	Vector file path (Shapefile, GPKG, FlatGeobuf, GeoJSON, GeoParquet,...). Routed through :func:`pyramids._io._parse_path` so cloud URLs work.	required
`layer`	`str \| int \| None`	Layer name or index for multi-layer formats.	`None`
`columns`	`list[str] \| None`	Attribute columns to load (`geometry` is always included).	`None`
`bbox`	`tuple[float, float, float, float] \| None`	`(minx, miny, maxx, maxy)` filter.	`None`
`where`	`str \| None`	OGR SQL `WHERE` predicate pushed down to the driver.	`None`
`batch_size`	`int \| None`	Requested RecordBatch size in rows. `None` uses the driver default.	`None`

Returns:

Type	Description
`Any`	pyarrow.RecordBatchReader: A streaming reader. Call
`Any`	`.read_all()` to materialise, or iterate for row-batch
`Any`	consumption.

Raises:

Type	Description
`ImportError`	If :mod:`pyogrio` is not installed.

Source code in src/pyramids/feature/collection.py

@classmethod
def open_arrow(
    cls,
    path: str | Path,
    *,
    layer: str | int | None = None,
    columns: list[str] | None = None,
    bbox: tuple[float, float, float, float] | None = None,
    where: str | None = None,
    batch_size: int | None = None,
) -> Any:
    """Open a vector file as a streaming :class:`pyarrow.RecordBatchReader`.

    Thin wrapper over :func:`pyogrio.raw.open_arrow` that surfaces
    the underlying Arrow RecordBatch iterator. Rows are yielded in
    batches, so callers can iterate through multi-GB datasets
    without materializing the whole table in memory — useful for
    building custom dask partitioners.

    Args:
        path: Vector file path (Shapefile, GPKG, FlatGeobuf,
            GeoJSON, GeoParquet,...). Routed through
            :func:`pyramids._io._parse_path` so cloud URLs work.
        layer: Layer name or index for multi-layer formats.
        columns: Attribute columns to load (`geometry` is
            always included).
        bbox: `(minx, miny, maxx, maxy)` filter.
        where: OGR SQL `WHERE` predicate pushed down to the
            driver.
        batch_size: Requested RecordBatch size in rows. `None`
            uses the driver default.

    Returns:
        pyarrow.RecordBatchReader: A streaming reader. Call
        `.read_all()` to materialise, or iterate for row-batch
        consumption.

    Raises:
        ImportError: If :mod:`pyogrio` is not installed.
    """
    try:
        from pyogrio.raw import open_arrow
    except ImportError as exc:
        raise ImportError(
            "open_arrow requires the optional 'pyogrio' dependency. "
            "Install with one of:\n"
            "  - PyPI:        pip install pyogrio\n"
            "  - conda-forge: conda install -c conda-forge pyogrio"
        ) from exc
    resolved = _pyramids_io._parse_path(path)
    kwargs: dict[str, Any] = {}
    if layer is not None:
        kwargs["layer"] = layer
    if columns is not None:
        kwargs["columns"] = columns
    if bbox is not None:
        kwargs["bbox"] = bbox
    if where is not None:
        kwargs["where"] = where
    if batch_size is not None:
        kwargs["batch_size"] = batch_size
    return open_arrow(resolved, **kwargs)

`read_parquet(path, *, columns=None, bbox=None, backend='pandas', split_row_groups=None, filters=None, blocksize=None, storage_options=None, **kwargs)` `classmethod` #

Read a GeoParquet file into a FeatureCollection.

GeoParquet is a cloud-native columnar vector format (OGC- adopted December 2024) — faster to scan than GeoJSON, smaller than Shapefile, and partitioned in a way that suits distributed compute. This method is a thin wrapper around :func:geopandas.read_parquet; the path is first routed through :func:pyramids._io._parse_path so cloud URLs (s3://, gs://, http(s)://, …) resolve the same way they do in :meth:read_file.

Requires the optional :mod:pyarrow dependency. Install with one of:

PyPI: pip install 'pyramids-gis[parquet]'
conda-forge: conda install -c conda-forge pyramids-parquet

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	Local path, cloud URL, or any form :func:`pyramids._io._parse_path` accepts.	required
`columns`	`list[str] \| None`	Project a subset of columns — Parquet's columnar layout makes this a true I/O win, unlike row-oriented formats. `geometry` is always loaded. `None` loads every column.	`None`
`bbox`	`tuple[float, float, float, float] \| None`	`(minx, miny, maxx, maxy)` spatial filter. Forwarded to :func:`geopandas.read_parquet` which uses the file's GeoParquet spatial-index metadata when present to skip non-matching row groups — a true I/O win on large files. `None` (default) loads every feature.	`None`
`**kwargs`	`Any`	Forwarded to :func:`geopandas.read_parquet` (`storage_options=` for fsspec, etc.).	`{}`

Returns:

Name	Type	Description
`FeatureCollection`	`FeatureCollection \| LazyFeatureCollection`	The file's features wrapped as a
	`FeatureCollection \| LazyFeatureCollection`	FeatureCollection.

Raises:

Type	Description
`ImportError`	If :mod:`pyarrow` is not installed, with a pyramids-branded message pointing at the `[parquet]` optional-dependency extra (D-M5).

Examples:

Round-trip a small FC through GeoParquet (requires pyarrow):

>>> import tempfile  # doctest: +SKIP
>>> from pathlib import Path  # doctest: +SKIP
>>> import geopandas as gpd  # doctest: +SKIP
>>> from shapely.geometry import Point  # doctest: +SKIP
>>> from pyramids.feature import FeatureCollection  # doctest: +SKIP
>>> d = Path(tempfile.mkdtemp())  # doctest: +SKIP
>>> path = d / "pts.parquet"  # doctest: +SKIP
>>> gpd.GeoDataFrame(
...     {"id": [1, 2]},
...     geometry=[Point(0, 0), Point(1, 1)],
...     crs="EPSG:4326",
... ).to_parquet(path)  # doctest: +SKIP
>>> fc = FeatureCollection.read_parquet(path)  # doctest: +SKIP
>>> len(fc)  # doctest: +SKIP
2
>>> fc.epsg  # doctest: +SKIP
4326

Project a subset of columns to speed up I/O on wide files:

>>> fc = FeatureCollection.read_parquet(  # doctest: +SKIP
...     "s3://bucket/big.parquet",
...     columns=["id", "geometry"],
... )
>>> fc.column  # doctest: +SKIP
['id', 'geometry']

A missing pyarrow dependency raises a branded ImportError:

>>> FeatureCollection.read_parquet("x.parquet")  # doctest: +SKIP
Traceback (most recent call last):
    ...
ImportError: GeoParquet support requires the optional 'pyarrow'...

Source code in src/pyramids/feature/collection.py

@classmethod
def read_parquet(
    cls,
    path: str | Path,
    *,
    columns: list[str] | None = None,
    bbox: tuple[float, float, float, float] | None = None,
    backend: str = "pandas",
    split_row_groups: bool | None = None,
    filters: list | None = None,
    blocksize: int | str | None = None,
    storage_options: dict | None = None,
    **kwargs: Any,
) -> FeatureCollection | LazyFeatureCollection:
    """Read a GeoParquet file into a FeatureCollection.

    GeoParquet is a cloud-native columnar vector format (OGC-
    adopted December 2024) — faster to scan than GeoJSON, smaller
    than Shapefile, and partitioned in a way that suits distributed
    compute. This method is a thin wrapper around
    :func:`geopandas.read_parquet`; the path is first routed
    through :func:`pyramids._io._parse_path` so cloud URLs
    (`s3://`, `gs://`, `http(s)://`, …) resolve the same way
    they do in :meth:`read_file`.

    Requires the optional :mod:`pyarrow` dependency. Install with one of:

    - PyPI: ``pip install 'pyramids-gis[parquet]'``
    - conda-forge: ``conda install -c conda-forge pyramids-parquet``

    Args:
        path (str | Path):
            Local path, cloud URL, or any form
            :func:`pyramids._io._parse_path` accepts.
        columns (list[str] | None):
            Project a subset of columns — Parquet's columnar
            layout makes this a true I/O win, unlike row-oriented
            formats. `geometry` is always loaded. `None`
            loads every column.
        bbox (tuple[float, float, float, float] | None):
            `(minx, miny, maxx, maxy)` spatial filter.
            Forwarded to :func:`geopandas.read_parquet` which uses
            the file's GeoParquet spatial-index metadata when
            present to skip non-matching row groups — a true I/O
            win on large files. `None` (default) loads every
            feature.
        **kwargs:
            Forwarded to :func:`geopandas.read_parquet`
            (`storage_options=` for fsspec, etc.).

    Returns:
        FeatureCollection: The file's features wrapped as a
        FeatureCollection.

    Raises:
        ImportError: If :mod:`pyarrow` is not installed, with a
            pyramids-branded message pointing at the
            `[parquet]` optional-dependency extra (D-M5).

    Examples:
        - Round-trip a small FC through GeoParquet (requires pyarrow):
            ```python
            >>> import tempfile  # doctest: +SKIP
            >>> from pathlib import Path  # doctest: +SKIP
            >>> import geopandas as gpd  # doctest: +SKIP
            >>> from shapely.geometry import Point  # doctest: +SKIP
            >>> from pyramids.feature import FeatureCollection  # doctest: +SKIP
            >>> d = Path(tempfile.mkdtemp())  # doctest: +SKIP
            >>> path = d / "pts.parquet"  # doctest: +SKIP
            >>> gpd.GeoDataFrame(
            ...     {"id": [1, 2]},
            ...     geometry=[Point(0, 0), Point(1, 1)],
            ...     crs="EPSG:4326",
            ... ).to_parquet(path)  # doctest: +SKIP
            >>> fc = FeatureCollection.read_parquet(path)  # doctest: +SKIP
            >>> len(fc)  # doctest: +SKIP
            2
            >>> fc.epsg  # doctest: +SKIP
            4326

            ```
        - Project a subset of columns to speed up I/O on wide files:
            ```python
            >>> fc = FeatureCollection.read_parquet(  # doctest: +SKIP
            ...     "s3://bucket/big.parquet",
            ...     columns=["id", "geometry"],
            ... )
            >>> fc.column  # doctest: +SKIP
            ['id', 'geometry']

            ```
        - A missing pyarrow dependency raises a branded `ImportError`:
            ```python
            >>> FeatureCollection.read_parquet("x.parquet")  # doctest: +SKIP
            Traceback (most recent call last):
                ...
            ImportError: GeoParquet support requires the optional 'pyarrow'...

            ```
    """
    resolved = _pyramids_io._parse_path(path)
    if backend == "dask":
        # check deps in order of specificity — the backend
        # request is the more specific signal, so the
        # dask-geopandas hint beats the generic pyarrow one.
        # When both are missing, the dask-geopandas error names
        # the extra that installs both ([parquet-lazy]).
        try:
            import dask_geopandas
        except ImportError as exc:
            raise ImportError(
                "backend='dask' requires the optional "
                "'dask-geopandas' dependency. Install with one of:\n"
                "  - PyPI:        pip install 'pyramids-gis[parquet-lazy]'\n"
                "  - conda-forge: conda install -c conda-forge pyramids-parquet-lazy"
            ) from exc
        dask_kwargs: dict[str, Any] = {}
        if columns is not None:
            dask_kwargs["columns"] = columns
        if split_row_groups is not None:
            dask_kwargs["split_row_groups"] = split_row_groups
        if filters is not None:
            dask_kwargs["filters"] = filters
        if blocksize is not None:
            dask_kwargs["blocksize"] = blocksize
        if storage_options is not None:
            dask_kwargs["storage_options"] = storage_options
        dask_kwargs.update(kwargs)
        # dask_geopandas is installed → assert pyarrow too, so
        # the user gets the pyramids-branded hint (not the
        # upstream message dask_geopandas would emit when it tries
        # to read). `[parquet-lazy]` pulls both.
        _require_pyarrow()
        # wrap the lazy return as a LazyFeatureCollection so the
        # dask branch stays inside the pyramids type system.
        from pyramids.feature._lazy_collection import LazyFeatureCollection

        dask_gdf = dask_geopandas.read_parquet(resolved, **dask_kwargs)
        return LazyFeatureCollection.from_dask_gdf(dask_gdf)
    if backend != "pandas":
        raise ValueError(f"backend must be 'pandas' or 'dask', got {backend!r}")
    _require_pyarrow()
    # geopandas 1.x forwards **kwargs straight into
    # `pyarrow.parquet.read_table`, which has never accepted the
    # pandas-style `engine=` kwarg. `_require_pyarrow()` above
    # already hard-guarantees the pyarrow backend, so no injection
    # is needed here. If geopandas ever reintroduces a fastparquet
    # path it will be opt-in via a new kwarg, not a silent switch.
    passthrough: dict[str, Any] = {}
    passthrough.update(kwargs)
    if columns is not None:
        passthrough["columns"] = columns
    if bbox is not None:
        passthrough["bbox"] = bbox
    if storage_options is not None:
        passthrough["storage_options"] = storage_options
    gdf = gpd.read_parquet(resolved, **passthrough)
    return cls(gdf)

`to_parquet(path, *, compression='snappy', index=None, **kwargs)` #

Write this FeatureCollection to GeoParquet.

Thin wrapper around :meth:geopandas.GeoDataFrame.to_parquet that defaults :param:compression to "snappy" — the format-standard tradeoff between speed and size.

Requires the optional :mod:pyarrow dependency. Install with one of:

PyPI: pip install 'pyramids-gis[parquet]'
conda-forge: conda install -c conda-forge pyramids-parquet

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	Destination file path.	required
`compression`	`str`	Parquet compression codec — `"snappy"` (default), `"gzip"`, `"brotli"`, `"lz4"`, `"zstd"`, or `"none"`. `"snappy"` is the GeoParquet-spec recommended default.	`'snappy'`
`index`	`bool \| None`	Whether to include the pandas index as a column. `None` (default) uses geopandas' default behavior: preserve a non-default index, drop the default `RangeIndex`.	`None`
`**kwargs`	`Any`	Forwarded to :meth:`geopandas.GeoDataFrame.to_parquet`.	`{}`

Raises:

Type	Description
`ImportError`	If :mod:`pyarrow` is not installed, with a pyramids-branded message pointing at the `[parquet]` optional-dependency extra (D-M5).

Examples:

Write a FeatureCollection with the default snappy codec:

>>> import tempfile  # doctest: +SKIP
>>> from pathlib import Path  # doctest: +SKIP
>>> import geopandas as gpd  # doctest: +SKIP
>>> from shapely.geometry import Point  # doctest: +SKIP
>>> from pyramids.feature import FeatureCollection  # doctest: +SKIP
>>> d = Path(tempfile.mkdtemp())  # doctest: +SKIP
>>> fc = FeatureCollection(
...     gpd.GeoDataFrame(
...         {"id": [1, 2]},
...         geometry=[Point(0, 0), Point(1, 1)],
...         crs="EPSG:4326",
...     )
... )  # doctest: +SKIP
>>> path = d / "out.parquet"  # doctest: +SKIP
>>> fc.to_parquet(path)  # doctest: +SKIP
>>> path.exists()  # doctest: +SKIP
True

Pick a different codec (e.g. zstd for better compression):

>>> import tempfile  # doctest: +SKIP
>>> from pathlib import Path  # doctest: +SKIP
>>> import geopandas as gpd  # doctest: +SKIP
>>> from shapely.geometry import Point  # doctest: +SKIP
>>> from pyramids.feature import FeatureCollection  # doctest: +SKIP
>>> d = Path(tempfile.mkdtemp())  # doctest: +SKIP
>>> fc = FeatureCollection(
...     gpd.GeoDataFrame(
...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
...     )
... )  # doctest: +SKIP
>>> fc.to_parquet(d / "out.parquet", compression="zstd")  # doctest: +SKIP

Source code in src/pyramids/feature/collection.py

def to_parquet(
    self,
    path: str | Path,
    *,
    compression: str = "snappy",
    index: bool | None = None,
    **kwargs: Any,
) -> None:
    """Write this FeatureCollection to GeoParquet.

    Thin wrapper around :meth:`geopandas.GeoDataFrame.to_parquet`
    that defaults :param:`compression` to `"snappy"` — the
    format-standard tradeoff between speed and size.

    Requires the optional :mod:`pyarrow` dependency. Install with one of:

    - PyPI: ``pip install 'pyramids-gis[parquet]'``
    - conda-forge: ``conda install -c conda-forge pyramids-parquet``

    Args:
        path (str | Path):
            Destination file path.
        compression (str):
            Parquet compression codec — `"snappy"` (default),
            `"gzip"`, `"brotli"`, `"lz4"`, `"zstd"`, or
            `"none"`. `"snappy"` is the GeoParquet-spec
            recommended default.
        index (bool | None):
            Whether to include the pandas index as a column.
            `None` (default) uses geopandas' default behavior:
            preserve a non-default index, drop the default
            `RangeIndex`.
        **kwargs:
            Forwarded to :meth:`geopandas.GeoDataFrame.to_parquet`.

    Raises:
        ImportError: If :mod:`pyarrow` is not installed, with a
            pyramids-branded message pointing at the
            `[parquet]` optional-dependency extra (D-M5).

    Examples:
        - Write a FeatureCollection with the default snappy codec:
            ```python
            >>> import tempfile  # doctest: +SKIP
            >>> from pathlib import Path  # doctest: +SKIP
            >>> import geopandas as gpd  # doctest: +SKIP
            >>> from shapely.geometry import Point  # doctest: +SKIP
            >>> from pyramids.feature import FeatureCollection  # doctest: +SKIP
            >>> d = Path(tempfile.mkdtemp())  # doctest: +SKIP
            >>> fc = FeatureCollection(
            ...     gpd.GeoDataFrame(
            ...         {"id": [1, 2]},
            ...         geometry=[Point(0, 0), Point(1, 1)],
            ...         crs="EPSG:4326",
            ...     )
            ... )  # doctest: +SKIP
            >>> path = d / "out.parquet"  # doctest: +SKIP
            >>> fc.to_parquet(path)  # doctest: +SKIP
            >>> path.exists()  # doctest: +SKIP
            True

            ```
        - Pick a different codec (e.g. zstd for better compression):
            ```python
            >>> import tempfile  # doctest: +SKIP
            >>> from pathlib import Path  # doctest: +SKIP
            >>> import geopandas as gpd  # doctest: +SKIP
            >>> from shapely.geometry import Point  # doctest: +SKIP
            >>> from pyramids.feature import FeatureCollection  # doctest: +SKIP
            >>> d = Path(tempfile.mkdtemp())  # doctest: +SKIP
            >>> fc = FeatureCollection(
            ...     gpd.GeoDataFrame(
            ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
            ...     )
            ... )  # doctest: +SKIP
            >>> fc.to_parquet(d / "out.parquet", compression="zstd")  # doctest: +SKIP

            ```
    """
    _require_pyarrow()
    super().to_parquet(path, compression=compression, index=index, **kwargs)

`to_file(path, driver='geojson', *, layer=None, mode='w', **creation_options)` #

Write this FeatureCollection to a vector file.

layer, mode, and arbitrary driver creation options are now first-class kwargs. Previously callers had to rely on implicit **kwargs forwarding, which hurt discoverability.

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	Destination file path.	required
`driver`	`str`	Driver alias (e.g. `"geojson"`, `"gpkg"`) or literal GDAL driver name (`"GeoJSON"`, `"GPKG"`, `"ESRI Shapefile"`). Resolved via :class:`Catalog`.	`'geojson'`
`layer`	`str \| None`	Layer name for multi-layer drivers (GPKG, GDB, …). Writing two layers into the same GPKG is the canonical use case. `None` defers to the driver default.	`None`
`mode`	`str`	`"w"` (default) overwrites; `"a"` appends to an existing layer. Append support depends on the driver — GPKG and Shapefile accept it, GeoJSON does not.	`'w'`
`**creation_options`	`Any`	Driver-specific creation options, forwarded to the underlying engine (pyogrio / fiona). Examples: GPKG: `SPATIAL_INDEX="YES"`, `FID="id"`. Shapefile: `ENCODING="UTF-8"`. GeoJSON: `COORDINATE_PRECISION=6`, `RFC7946=YES`. Keys are case-preserving and passed verbatim to the driver; consult the GDAL driver docs for the full list. pyogrio (the default geopandas engine on 1.0+) raises :class:`ValueError` with the message `"unrecognized option '<name>' for driver '<driver>'"` when a supplied option is neither in the driver's dataset nor its layer creation-option list. This surfaces typos (`SPATIAL_INDX` vs `SPATIAL_INDEX`) at write-time rather than silently producing a different file. Some drivers may still accept options that pyogrio does not list — verify against the driver's docs when in doubt.	`{}`

Raises:

Type	Description
`ValueError`	If `mode` isn't `"w"` or `"a"`, or if a supplied creation option is not recognised by the driver (raised by pyogrio — see the `**creation_options` note above).

Examples:

Round-trip a small FC through GeoJSON (the default driver):

>>> import tempfile
>>> from pathlib import Path
>>> import geopandas as gpd
>>> from shapely.geometry import Point
>>> from pyramids.feature import FeatureCollection
>>> d = Path(tempfile.mkdtemp())
>>> fc = FeatureCollection(
...     gpd.GeoDataFrame(
...         {"id": [1, 2]},
...         geometry=[Point(0, 0), Point(1, 1)],
...         crs="EPSG:4326",
...     )
... )
>>> path = d / "out.geojson"
>>> fc.to_file(path)
>>> path.exists()
True
>>> FeatureCollection.read_file(path).column
['id', 'geometry']

Write to GeoPackage with a named layer:

>>> import tempfile
>>> from pathlib import Path
>>> import geopandas as gpd
>>> from shapely.geometry import Point
>>> from pyramids.feature import FeatureCollection
>>> d = Path(tempfile.mkdtemp())
>>> fc = FeatureCollection(
...     gpd.GeoDataFrame(
...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
...     )
... )
>>> path = d / "out.gpkg"
>>> fc.to_file(path, driver="gpkg", layer="rivers")
>>> FeatureCollection.list_layers(path)
['rivers']

Invalid mode raises ValueError before touching the file:

>>> import geopandas as gpd
>>> from shapely.geometry import Point
>>> from pyramids.feature import FeatureCollection
>>> fc = FeatureCollection(
...     gpd.GeoDataFrame(
...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
...     )
... )
>>> fc.to_file("ignored.geojson", mode="x")
Traceback (most recent call last):
    ...
ValueError: mode must be 'w' (write) or 'a' (append); got 'x'.

Source code in src/pyramids/feature/collection.py

def to_file(
    self,
    path: str | Path,
    driver: str = "geojson",
    *,
    layer: str | None = None,
    mode: str = "w",
    **creation_options: Any,
) -> None:
    """Write this FeatureCollection to a vector file.

    `layer`, `mode`, and arbitrary driver creation
    options are now first-class kwargs. Previously callers had to
    rely on implicit `**kwargs` forwarding, which hurt
    discoverability.

    Args:
        path (str | Path):
            Destination file path.
        driver (str):
            Driver alias (e.g. `"geojson"`, `"gpkg"`) or
            literal GDAL driver name (`"GeoJSON"`, `"GPKG"`,
            `"ESRI Shapefile"`). Resolved via :class:`Catalog`.
        layer (str | None):
            Layer name for multi-layer drivers (GPKG, GDB, …).
            Writing two layers into the same GPKG is the canonical
            use case. `None` defers to the driver default.
        mode (str):
            `"w"` (default) overwrites; `"a"` appends to an
            existing layer. Append support depends on the driver
            — GPKG and Shapefile accept it, GeoJSON does not.
        **creation_options:
            Driver-specific creation options, forwarded to the
            underlying engine (pyogrio / fiona). Examples:

            * GPKG: `SPATIAL_INDEX="YES"`, `FID="id"`.
            * Shapefile: `ENCODING="UTF-8"`.
            * GeoJSON: `COORDINATE_PRECISION=6`, `RFC7946=YES`.

            Keys are case-preserving and passed verbatim to the
            driver; consult the GDAL driver docs for the full
            list.

            pyogrio (the default geopandas engine on 1.0+)
            raises :class:`ValueError` with the message
            `"unrecognized option '<name>' for driver '<driver>'"`
            when a supplied option is neither in the driver's
            dataset nor its layer creation-option list. This
            surfaces typos (`SPATIAL_INDX` vs `SPATIAL_INDEX`)
            at write-time rather than silently producing a
            different file. Some drivers may still accept options
            that pyogrio does not list — verify against the
            driver's docs when in doubt.

    Raises:
        ValueError: If `mode` isn't `"w"` or `"a"`, or if a
            supplied creation option is not recognised by the
            driver (raised by pyogrio — see the `**creation_options`
            note above).

    Examples:
        - Round-trip a small FC through GeoJSON (the default driver):
            ```python
            >>> import tempfile
            >>> from pathlib import Path
            >>> import geopandas as gpd
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> d = Path(tempfile.mkdtemp())
            >>> fc = FeatureCollection(
            ...     gpd.GeoDataFrame(
            ...         {"id": [1, 2]},
            ...         geometry=[Point(0, 0), Point(1, 1)],
            ...         crs="EPSG:4326",
            ...     )
            ... )
            >>> path = d / "out.geojson"
            >>> fc.to_file(path)
            >>> path.exists()
            True
            >>> FeatureCollection.read_file(path).column
            ['id', 'geometry']

            ```
        - Write to GeoPackage with a named layer:
            ```python
            >>> import tempfile
            >>> from pathlib import Path
            >>> import geopandas as gpd
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> d = Path(tempfile.mkdtemp())
            >>> fc = FeatureCollection(
            ...     gpd.GeoDataFrame(
            ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
            ...     )
            ... )
            >>> path = d / "out.gpkg"
            >>> fc.to_file(path, driver="gpkg", layer="rivers")
            >>> FeatureCollection.list_layers(path)
            ['rivers']

            ```
        - Invalid `mode` raises `ValueError` before touching the file:
            ```python
            >>> import geopandas as gpd
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> fc = FeatureCollection(
            ...     gpd.GeoDataFrame(
            ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
            ...     )
            ... )
            >>> fc.to_file("ignored.geojson", mode="x")
            Traceback (most recent call last):
                ...
            ValueError: mode must be 'w' (write) or 'a' (append); got 'x'.

            ```
    """
    if mode not in ("w", "a"):
        raise ValueError(f"mode must be 'w' (write) or 'a' (append); got {mode!r}.")
    try:
        resolved = CATALOG.get_gdal_name(driver) or driver
    except AttributeError:
        resolved = driver

    # pin the engine to pyogrio to match :meth:`read_file` and
    # :meth:`iter_features`. Callers who want fiona for some reason
    # can override via `engine="fiona"` in creation_options, but
    # the default gets the fast path and the pyogrio-specific
    # unknown-option validation.
    passthrough: dict[str, Any] = {
        "driver": resolved,
        "mode": mode,
        "engine": "pyogrio",
    }
    if layer is not None:
        passthrough["layer"] = layer
    passthrough.update(creation_options)
    super().to_file(path, **passthrough)

`explode(geometry='multipolygon')` #

Explode multi-geometry rows into per-row single geometries.

Returns a new FeatureCollection where every row whose geometry type matches geometry is split so each child geometry becomes its own row. The current frame is not mutated.

Parameters:

Name	Type	Description	Default
`geometry`	`str`	The geometry type to explode (case-insensitive). Defaults to `"multipolygon"`.	`'multipolygon'`

Returns:

Name	Type	Description
`FeatureCollection`	`FeatureCollection`	A new collection with the same CRS as
	`FeatureCollection`	`self` and exploded geometries.

Examples:

Explode a frame mixing one MultiPolygon with a Polygon:

>>> import geopandas as gpd
>>> from shapely.geometry import Polygon, MultiPolygon
>>> from pyramids.feature import FeatureCollection
>>> gdf = gpd.GeoDataFrame(
...     {
...         "name": ["a", "b"],
...         "geometry": [
...             MultiPolygon([
...                 Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]),
...                 Polygon([(5, 5), (7, 5), (7, 7), (5, 7)]),
...             ]),
...             Polygon([(10, 10), (11, 10), (11, 11), (10, 11)]),
...         ],
...     },
...     crs="EPSG:4326",
... )
>>> fc = FeatureCollection(gdf)
>>> result = fc.explode("multipolygon")
>>> len(result)
3
>>> [g.geom_type for g in result.geometry]
['Polygon', 'Polygon', 'Polygon']

Source code in src/pyramids/feature/collection.py

def explode(self, geometry: str = "multipolygon") -> FeatureCollection:
    """Explode multi-geometry rows into per-row single geometries.

    Returns a new ``FeatureCollection`` where every row whose geometry
    type matches ``geometry`` is split so each child geometry becomes
    its own row. The current frame is not mutated.

    Args:
        geometry (str): The geometry type to explode (case-insensitive).
            Defaults to ``"multipolygon"``.

    Returns:
        FeatureCollection: A new collection with the same CRS as
        ``self`` and exploded geometries.

    Examples:
        - Explode a frame mixing one MultiPolygon with a Polygon:
            ```python
            >>> import geopandas as gpd
            >>> from shapely.geometry import Polygon, MultiPolygon
            >>> from pyramids.feature import FeatureCollection
            >>> gdf = gpd.GeoDataFrame(
            ...     {
            ...         "name": ["a", "b"],
            ...         "geometry": [
            ...             MultiPolygon([
            ...                 Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]),
            ...                 Polygon([(5, 5), (7, 5), (7, 7), (5, 7)]),
            ...             ]),
            ...             Polygon([(10, 10), (11, 10), (11, 11), (10, 11)]),
            ...         ],
            ...     },
            ...     crs="EPSG:4326",
            ... )
            >>> fc = FeatureCollection(gdf)
            >>> result = fc.explode("multipolygon")
            >>> len(result)
            3
            >>> [g.geom_type for g in result.geometry]
            ['Polygon', 'Polygon', 'Polygon']

            ```
    """
    return FeatureCollection(_geom.explode_gdf(self, geometry=geometry))

`with_coordinates()` #

Return a new FeatureCollection with per-vertex x and y columns.

non-mutating replacement for the old xy() method (which has been deleted). Matches pandas / geopandas convention — data-transformation methods return a new object. The with_ prefix follows the stdlib/pandas pattern for "return a copy with this change applied" (e.g. :meth:pathlib.Path.with_suffix).

Explodes MultiPolygon and GeometryCollection geometries into their parts first, then attaches x and y columns containing the coordinate sequences of each row.

Returns:

Name	Type	Description
`FeatureCollection`	`FeatureCollection`	A new FeatureCollection (`self` is
	`FeatureCollection`	not modified) with the original columns plus `x` and
	`FeatureCollection`	`y` per-vertex coordinate lists.

Examples:

A Point FC gets scalar x / y per row:

>>> import geopandas as gpd
>>> from shapely.geometry import Point
>>> from pyramids.feature import FeatureCollection
>>> fc = FeatureCollection(
...     gpd.GeoDataFrame(
...         {"id": [1, 2]},
...         geometry=[Point(1.0, 2.0), Point(3.0, 4.0)],
...         crs="EPSG:4326",
...     )
... )
>>> out = fc.with_coordinates()
>>> list(out["x"])
[1.0, 3.0]
>>> list(out["y"])
[2.0, 4.0]

The input FC is not mutated:

>>> import geopandas as gpd
>>> from shapely.geometry import Point
>>> from pyramids.feature import FeatureCollection
>>> fc = FeatureCollection(
...     gpd.GeoDataFrame(
...         {"id": [1]}, geometry=[Point(0.0, 0.0)],
...         crs="EPSG:4326",
...     )
... )
>>> _ = fc.with_coordinates()
>>> "x" in fc.columns
False

Source code in src/pyramids/feature/collection.py

def with_coordinates(self) -> FeatureCollection:
    """Return a new FeatureCollection with per-vertex `x` and `y` columns.

    non-mutating replacement for the old `xy()` method
    (which has been deleted). Matches pandas / geopandas
    convention — data-transformation methods return a new object.
    The `with_` prefix follows the stdlib/pandas pattern for
    "return a copy with this change applied" (e.g.
    :meth:`pathlib.Path.with_suffix`).

    Explodes MultiPolygon and GeometryCollection geometries into
    their parts first, then attaches `x` and `y` columns
    containing the coordinate sequences of each row.

    Returns:
        FeatureCollection: A new FeatureCollection (`self` is
        not modified) with the original columns plus `x` and
        `y` per-vertex coordinate lists.

    Examples:
        - A Point FC gets scalar `x` / `y` per row:
            ```python
            >>> import geopandas as gpd
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> fc = FeatureCollection(
            ...     gpd.GeoDataFrame(
            ...         {"id": [1, 2]},
            ...         geometry=[Point(1.0, 2.0), Point(3.0, 4.0)],
            ...         crs="EPSG:4326",
            ...     )
            ... )
            >>> out = fc.with_coordinates()
            >>> list(out["x"])
            [1.0, 3.0]
            >>> list(out["y"])
            [2.0, 4.0]

            ```
        - The input FC is not mutated:
            ```python
            >>> import geopandas as gpd
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> fc = FeatureCollection(
            ...     gpd.GeoDataFrame(
            ...         {"id": [1]}, geometry=[Point(0.0, 0.0)],
            ...         crs="EPSG:4326",
            ...     )
            ... )
            >>> _ = fc.with_coordinates()
            >>> "x" in fc.columns
            False

            ```
    """
    gdf = _geom.explode_gdf(
        gpd.GeoDataFrame(self, copy=True), geometry="multipolygon"
    )
    gdf = _geom.explode_gdf(gdf, geometry="geometrycollection")

    fc = FeatureCollection(gdf)
    fc["x"] = fc.apply(
        _geom.get_coords, geom_col="geometry", coord_type="x", axis=1
    )
    fc["y"] = fc.apply(
        _geom.get_coords, geom_col="geometry", coord_type="y", axis=1
    )
    fc.reset_index(drop=True, inplace=True)
    return fc

`plot(column=None, basemap=None, **kwargs)` #

Plot features, optionally on a web-tile basemap.

Delegates to :meth:geopandas.GeoDataFrame.plot and, when basemap is truthy, adds an OSM (or named provider) tile layer underneath.

Raises:

Type	Description
`ValueError`	If `basemap` is requested but the FC has no CRS.

Source code in src/pyramids/feature/collection.py

def plot(
    self,
    column: str | None = None,
    basemap: bool | str | None = None,
    **kwargs: Any,
) -> Any:
    """Plot features, optionally on a web-tile basemap.

    Delegates to :meth:`geopandas.GeoDataFrame.plot` and, when
    `basemap` is truthy, adds an OSM (or named provider) tile
    layer underneath.

    Raises:
        ValueError: If `basemap` is requested but the FC has no CRS.
    """
    ax = super().plot(column=column, **kwargs)

    if basemap:
        if self.epsg is None:
            raise CRSError(
                "FeatureCollection must have a CRS (epsg) to use basemap."
            )
        source = basemap if isinstance(basemap, str) else None
        add_basemap(ax, crs=self.epsg, source=source)

    return ax

`concat(other)` #

Concatenate another GeoDataFrame onto this FeatureCollection.

mirrors :func:pandas.concat — returns a new FeatureCollection and never mutates self. No inplace kwarg (pandas' pd.concat has never had one; follow the convention).

Equivalent to pd.concat([fc, other]) which also works directly and returns a FeatureCollection via the _constructor hook.

a CRS mismatch between self and other raises :class:pyramids.base._errors.CRSError. The old behaviour silently adopted self's CRS — which corrupted the other rows' coordinates if the two frames were in different CRSes. Callers that want to force-concat across CRSes must other.to_crs(self.crs) first. An unset-on-one-side case (one CRS is None) is permitted so you can seed a CRS by concatenating a CRS-carrying frame onto a freshly-constructed empty FC.

Parameters:

Name	Type	Description	Default
`other`	`GeoDataFrame`	The rows to append.	required

Returns:

Name	Type	Description
`FeatureCollection`	`FeatureCollection`	A new FC containing `self`'s rows
	`FeatureCollection`	followed by `other`'s rows, with `self`'s CRS and a
	`FeatureCollection`	freshly-reset index.

Raises:

Type	Description
`CRSError`	If both frames carry a CRS and the two CRSes do not match.

Examples:

Concatenate two single-row FCs on matching CRS:

>>> import geopandas as gpd
>>> from shapely.geometry import Point
>>> from pyramids.feature import FeatureCollection
>>> a = FeatureCollection(
...     gpd.GeoDataFrame(
...         {"id": [1]}, geometry=[Point(0, 0)],
...         crs="EPSG:4326",
...     )
... )
>>> b = FeatureCollection(
...     gpd.GeoDataFrame(
...         {"id": [2]}, geometry=[Point(1, 1)],
...         crs="EPSG:4326",
...     )
... )
>>> out = a.concat(b)
>>> len(out)
2
>>> list(out["id"])
[1, 2]
>>> out.crs.to_epsg()
4326

CRS mismatch raises CRSError:

>>> import geopandas as gpd
>>> from shapely.geometry import Point
>>> from pyramids.feature import FeatureCollection
>>> a = FeatureCollection(
...     gpd.GeoDataFrame(
...         {"id": [1]}, geometry=[Point(0, 0)],
...         crs="EPSG:4326",
...     )
... )
>>> b = FeatureCollection(
...     gpd.GeoDataFrame(
...         {"id": [2]}, geometry=[Point(1, 1)],
...         crs="EPSG:3857",
...     )
... )
>>> a.concat(b)
Traceback (most recent call last):
    ...
pyramids.base._errors.CRSError: concat: CRS mismatch...

Source code in src/pyramids/feature/collection.py

def concat(self, other: GeoDataFrame) -> FeatureCollection:
    """Concatenate another GeoDataFrame onto this FeatureCollection.

    mirrors :func:`pandas.concat` — returns a new
    `FeatureCollection` and never mutates `self`. No
    `inplace` kwarg (pandas' `pd.concat` has never had one;
    follow the convention).

    Equivalent to `pd.concat([fc, other])` which also works
    directly and returns a `FeatureCollection` via the
    `_constructor` hook.

    a CRS mismatch between `self` and `other` raises
    :class:`pyramids.base._errors.CRSError`. The old behaviour
    silently adopted `self`'s CRS — which corrupted the
    `other` rows' coordinates if the two frames were in
    different CRSes. Callers that want to force-concat across
    CRSes must `other.to_crs(self.crs)` first. An
    unset-on-one-side case (one CRS is `None`) is permitted so
    you can seed a CRS by concatenating a CRS-carrying frame
    onto a freshly-constructed empty FC.

    Args:
        other (GeoDataFrame): The rows to append.

    Returns:
        FeatureCollection: A new FC containing `self`'s rows
        followed by `other`'s rows, with `self`'s CRS and a
        freshly-reset index.

    Raises:
        CRSError: If both frames carry a CRS and the two CRSes
            do not match.

    Examples:
        - Concatenate two single-row FCs on matching CRS:
            ```python
            >>> import geopandas as gpd
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> a = FeatureCollection(
            ...     gpd.GeoDataFrame(
            ...         {"id": [1]}, geometry=[Point(0, 0)],
            ...         crs="EPSG:4326",
            ...     )
            ... )
            >>> b = FeatureCollection(
            ...     gpd.GeoDataFrame(
            ...         {"id": [2]}, geometry=[Point(1, 1)],
            ...         crs="EPSG:4326",
            ...     )
            ... )
            >>> out = a.concat(b)
            >>> len(out)
            2
            >>> list(out["id"])
            [1, 2]
            >>> out.crs.to_epsg()
            4326

            ```
        - CRS mismatch raises `CRSError`:
            ```python
            >>> import geopandas as gpd
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> a = FeatureCollection(
            ...     gpd.GeoDataFrame(
            ...         {"id": [1]}, geometry=[Point(0, 0)],
            ...         crs="EPSG:4326",
            ...     )
            ... )
            >>> b = FeatureCollection(
            ...     gpd.GeoDataFrame(
            ...         {"id": [2]}, geometry=[Point(1, 1)],
            ...         crs="EPSG:3857",
            ...     )
            ... )
            >>> a.concat(b)
            Traceback (most recent call last):
                ...
            pyramids.base._errors.CRSError: concat: CRS mismatch...

            ```
    """
    # validate CRS agreement up front.
    if self.crs is not None and other.crs is not None:
        if self.crs != other.crs:
            raise CRSError(
                f"concat: CRS mismatch — self.crs = {self.crs!r}, "
                f"other.crs = {other.crs!r}. Reproject one side "
                f"— `other.to_crs(self.crs)` OR "
                f"`self.to_crs(other.crs)` — before "
                f"concatenating, or strip one CRS with "
                f".set_crs(None, allow_override=True)."
            )
    combined = gpd.GeoDataFrame(pd.concat([self, other]))
    combined.index = list(range(len(combined)))
    combined.crs = self.crs if self.crs is not None else other.crs
    return FeatureCollection(combined)

`with_centroid()` #

Return a new FC with per-feature center-point columns attached.

non-mutating replacement for the old center_point() method (which has been deleted). The with_ prefix mirrors stdlib / pandas conventions for "return a copy with this change applied".

Computes average x/y per feature (after :meth:with_coordinates) and attaches three columns: avg_x, avg_y and center_point (shapely Point).

feeding a degenerate or empty geometry (for example an empty Point, or a Polygon whose ring has zero area) produces (NaN, NaN) averages. The method emits a single UserWarning listing the row indices whose avg_x / avg_y could not be computed so downstream code can guard against the NaN centroids instead of silently consuming them. The center_point value at those rows is an empty shapely.Point (Point.is_empty is True) rather than a (NaN, NaN) point.

Returns:

Name	Type	Description
`FeatureCollection`	`FeatureCollection`	A new FeatureCollection (`self` is
	`FeatureCollection`	not modified) with `x`, `y`, `avg_x`, `avg_y`,
	`FeatureCollection`	`center_point` columns added.

Examples:

Compute centroids for a 2-polygon FC:

>>> import geopandas as gpd
>>> from shapely.geometry import Polygon
>>> from pyramids.feature import FeatureCollection
>>> fc = FeatureCollection(
...     gpd.GeoDataFrame(
...         {"id": [1, 2]},
...         geometry=[
...             Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]),
...             Polygon([(4, 4), (6, 4), (6, 6), (4, 6)]),
...         ],
...         crs="EPSG:4326",
...     )
... )
>>> out = fc.with_centroid()
>>> [(p.x, p.y) for p in out["center_point"]]
[(0.8, 0.8), (4.8, 4.8)]

A Point FC is a no-op for the coordinate lists (each row is already a single vertex); the centroid equals the point:

>>> import geopandas as gpd
>>> from shapely.geometry import Point
>>> from pyramids.feature import FeatureCollection
>>> fc = FeatureCollection(
...     gpd.GeoDataFrame(
...         {"id": [1, 2]},
...         geometry=[Point(3.0, 4.0), Point(7.0, 8.0)],
...         crs="EPSG:4326",
...     )
... )
>>> out = fc.with_centroid()
>>> [(p.x, p.y) for p in out["center_point"]]
[(3.0, 4.0), (7.0, 8.0)]

Source code in src/pyramids/feature/collection.py

def with_centroid(self) -> FeatureCollection:
    """Return a new FC with per-feature center-point columns attached.

    non-mutating replacement for the old `center_point()`
    method (which has been deleted). The `with_` prefix mirrors
    stdlib / pandas conventions for "return a copy with this
    change applied".

    Computes average x/y per feature (after
    :meth:`with_coordinates`) and attaches three columns:
    `avg_x`, `avg_y` and `center_point` (shapely `Point`).

    feeding a degenerate or empty geometry (for example an
    empty `Point`, or a `Polygon` whose ring has zero area)
    produces `(NaN, NaN)` averages. The method emits a single
    `UserWarning` listing the row indices whose `avg_x` /
    `avg_y` could not be computed so downstream code can guard
    against the NaN centroids instead of silently consuming them.
    The `center_point` value at those rows is an empty
    `shapely.Point` (`Point.is_empty is True`) rather than a
    `(NaN, NaN)` point.

    Returns:
        FeatureCollection: A new FeatureCollection (`self` is
        not modified) with `x`, `y`, `avg_x`, `avg_y`,
        `center_point` columns added.

    Examples:
        - Compute centroids for a 2-polygon FC:
            ```python
            >>> import geopandas as gpd
            >>> from shapely.geometry import Polygon
            >>> from pyramids.feature import FeatureCollection
            >>> fc = FeatureCollection(
            ...     gpd.GeoDataFrame(
            ...         {"id": [1, 2]},
            ...         geometry=[
            ...             Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]),
            ...             Polygon([(4, 4), (6, 4), (6, 6), (4, 6)]),
            ...         ],
            ...         crs="EPSG:4326",
            ...     )
            ... )
            >>> out = fc.with_centroid()
            >>> [(p.x, p.y) for p in out["center_point"]]
            [(0.8, 0.8), (4.8, 4.8)]

            ```
        - A Point FC is a no-op for the coordinate lists (each row
          is already a single vertex); the centroid equals the point:
            ```python
            >>> import geopandas as gpd
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> fc = FeatureCollection(
            ...     gpd.GeoDataFrame(
            ...         {"id": [1, 2]},
            ...         geometry=[Point(3.0, 4.0), Point(7.0, 8.0)],
            ...         crs="EPSG:4326",
            ...     )
            ... )
            >>> out = fc.with_centroid()
            >>> [(p.x, p.y) for p in out["center_point"]]
            [(3.0, 4.0), (7.0, 8.0)]

            ```
    """
    fc = self.with_coordinates()
    for i, row_i in fc.iterrows():
        fc.loc[i, "avg_x"] = np.mean(row_i["x"])
        fc.loc[i, "avg_y"] = np.mean(row_i["y"])

    # detect rows whose averaged coordinate could not be
    # computed (empty geometry, all-NaN rings, etc.). Emit a single
    # summary warning and substitute an empty Point so the column
    # does not expose a `(NaN, NaN)` Point that would then crash
    # downstream reprojections.
    avg_x = fc["avg_x"].to_numpy()
    avg_y = fc["avg_y"].to_numpy()
    bad_mask = np.isnan(avg_x) | np.isnan(avg_y)
    if bad_mask.any():
        bad_idx = [int(i) for i, is_bad in enumerate(bad_mask) if is_bad]
        warnings.warn(
            f"with_centroid: {len(bad_idx)} row(s) yielded NaN centroids "
            f"(rows {bad_idx}). Their `center_point` is an empty "
            f"shapely.Point. Drop or repair those rows before running "
            f"a method that requires a valid centroid (e.g. reproject, "
            f"distance).",
            GeometryWarning,
            stacklevel=2,
        )

    # single-pass build. The previous implementation built a
    # throwaway `coords_list` (with NaN placeholders for the bad
    # rows), called `create_points` on it, then iterated the
    # result a second time to substitute empty Points for the bad
    # rows. Skip both intermediates — write the final column value
    # directly.
    cleaned: list[Any] = [
        Point() if bad else Point(ax, ay)
        for ax, ay, bad in zip(avg_x.tolist(), avg_y.tolist(), bad_mask.tolist())
    ]
    fc["center_point"] = cleaned
    return fc