Lazy vector reads with `LazyFeatureCollection`¶

This notebook walks through the dask-backed vector path in pyramids.feature. It uses a real, public, 100+ GB dataset — Overture Maps — hosted as GeoParquet on AWS S3. You will NOT download the whole thing; the point is to show that lazy reads + Arrow pushdown filters let you work with a dataset that doesn't fit in RAM.

What you'll see¶

Reading a subset of Overture Places (restaurants in a specific bbox) with Arrow filters pushed down to Parquet — only the matching row groups are fetched.
Using LazyFeatureCollection methods (.spatial_shuffle, .sjoin, .persist, .compute) on a partitioned frame.
Writing the result back to GeoParquet — the one genuinely lazy vector write path.

Requirements¶

pip install 'pyramids-gis[parquet]'

This pulls pyarrow, dask, dask-geopandas, fsspec, s3fs. The [parquet] extra is what unlocks everything below.

⚠ Scheduler — set this BEFORE any compute¶

Shapely and GEOS hold the Python GIL. Under dask's default threaded scheduler, map_partitions on geometry ops runs single-core, and you will see slower timings than the eager path. Configure a process scheduler (or a LocalCluster) up front:

In [ ]:

Copied!





%matplotlib inline

from pyramids import configure_lazy_vector

# One call, applies to every LazyFeatureCollection.compute() in this session.
applied = configure_lazy_vector(
    scheduler="processes",
    target_bytes_per_partition=128
    * 1024
    * 1024,  # 128 MiB, the default — shown here for clarity
)
applied
%matplotlib inline

from pyramids import configure_lazy_vector

# One call, applies to every LazyFeatureCollection.compute() in this session.
applied = configure_lazy_vector(
    scheduler="processes",
    target_bytes_per_partition=128
    * 1024
    * 1024,  # 128 MiB, the default — shown here for clarity
)
applied

1. Read a bounded slice of Overture Places¶

Overture partitions places data by theme / type under the public bucket s3://overturemaps-us-west-2/. With backend='dask' and the filters= pushdown, only row groups that intersect our bbox are fetched. We constrain to a tiny slice over central Paris so the notebook runs in a minute or two over a normal home connection.

In [ ]:

Copied!





from pyramids.feature import FeatureCollection

OVERTURE_PLACES = (
    "s3://overturemaps-us-west-2/release/2026-03-18.0/" "theme=places/type=place/"
)

paris_bbox = (2.29, 48.85, 2.40, 48.90)  # lon/lat (minx, miny, maxx, maxy)

lazy_places = FeatureCollection.read_parquet(
    OVERTURE_PLACES,
    backend="dask",
    columns=["names", "categories", "geometry"],
    filters=[
        ("bbox.xmin", ">=", paris_bbox[0]),
        ("bbox.ymin", ">=", paris_bbox[1]),
        ("bbox.xmax", "<=", paris_bbox[2]),
        ("bbox.ymax", "<=", paris_bbox[3]),
    ],
    storage_options={"anon": True},  # Overture bucket is public; s3fs won't try to sign
)

type(lazy_places).__name__, lazy_places.npartitions
from pyramids.feature import FeatureCollection

OVERTURE_PLACES = (
    "s3://overturemaps-us-west-2/release/2026-03-18.0/" "theme=places/type=place/"
)

paris_bbox = (2.29, 48.85, 2.40, 48.90)  # lon/lat (minx, miny, maxx, maxy)

lazy_places = FeatureCollection.read_parquet(
    OVERTURE_PLACES,
    backend="dask",
    columns=["names", "categories", "geometry"],
    filters=[
        ("bbox.xmin", ">=", paris_bbox[0]),
        ("bbox.ymin", ">=", paris_bbox[1]),
        ("bbox.xmax", "<=", paris_bbox[2]),
        ("bbox.ymax", "<=", paris_bbox[3]),
    ],
    storage_options={"anon": True},  # Overture bucket is public; s3fs won't try to sign
)

type(lazy_places).__name__, lazy_places.npartitions

The return is a LazyFeatureCollection — a subclass of dask_geopandas.GeoDataFrame. Nothing has been fetched from S3 yet; only the Parquet file metadata has been scanned to build the task graph.

2. Stay lazy while filtering, keep pushdown benefits¶

Any dask_geopandas.GeoDataFrame method works. Methods pyramids overrides (compute, persist, spatial_shuffle, to_parquet) return a LazyFeatureCollection so you keep the pyramids type contract across ops.

In [ ]:

Copied!





# Spatial shuffle → spatial_partitions populated → partition-pruned sjoin.
# Partially eager — computes Hilbert curve distances for every row.
shuffled = lazy_places.spatial_shuffle(by="hilbert").persist()

# Cheap because total_bounds is inherited from dask-geopandas.
# It returns a dask Scalar; compute_total_bounds() forces the O(partitions)
# reduction explicitly so you see the cost in the method name.
bounds = shuffled.compute_total_bounds()
bounds
# Spatial shuffle → spatial_partitions populated → partition-pruned sjoin.
# Partially eager — computes Hilbert curve distances for every row.
shuffled = lazy_places.spatial_shuffle(by="hilbert").persist()

# Cheap because total_bounds is inherited from dask-geopandas.
# It returns a dask Scalar; compute_total_bounds() forces the O(partitions)
# reduction explicitly so you see the cost in the method name.
bounds = shuffled.compute_total_bounds()
bounds

3. Materialise a slice back to eager for plotting / exporting¶

LazyFeatureCollection.compute() returns a FeatureCollection — the eager side of the same protocol. All pyramids-specific methods (extract_vertices, with_coordinates, plot, to_file) become available at this point because you're back in eager land.

In [ ]:

Copied!

eager = shuffled.compute()
type(eager).__name__, len(eager), eager.epsg
eager = shuffled.compute()
type(eager).__name__, len(eager), eager.epsg

Plot the materialised places¶

eager is a concrete FeatureCollection — the eager side of the protocol returned by .compute() — so the pyramids-specific plot() is now available. Each point is one Overture place inside the Paris bbox.

In [ ]:

Copied!

eager.plot()
eager.plot()

4. Write back to GeoParquet — the one lazy-native writer¶

LazyFeatureCollection.to_parquet is the only write path that stays lazy across partitions. .to_file(...) raises NotImplementedError because dask-geopandas has no lazy OGR write path — you'd have to .compute().to_file(path) first.

In [ ]:

Copied!





# Returns None after every partition has been written; no Delayed leaked out.
shuffled.to_parquet("paris_places.parquet", compression="snappy")

# Quick sanity check — read it back via the eager path.
reread = FeatureCollection.read_parquet("paris_places.parquet")
len(reread), reread.crs
# Returns None after every partition has been written; no Delayed leaked out.
shuffled.to_parquet("paris_places.parquet", compression="snappy")

# Quick sanity check — read it back via the eager path.
reread = FeatureCollection.read_parquet("paris_places.parquet")
len(reread), reread.crs

5. What the API deliberately does NOT do on lazy FCs¶

The separate-class design (LazyFeatureCollection subclasses dask_geopandas.GeoDataFrame, not pyramids' own FeatureCollection) means inherited methods behave consistently with dask-geopandas. pyramids doesn't try to fake eager semantics:

Op	Behaviour on LazyFC
`len(lfc)`	raises — dask.dataframe can't size without compute
`lfc.iloc[0]`	raises — positional row access needs compute
`lfc.total_bounds`	returns a dask `Scalar`; call `.compute()` or `lfc.compute_total_bounds()`
`lfc.plot()`	raises `NotImplementedError` — no lazy plot path
`lfc.to_file(path)`	raises `NotImplementedError` — no lazy OGR write
`lfc.extract_vertices()`, `.with_coordinates()`, ...	`AttributeError` — pyramids-specific eager-only methods

Use pyramids.feature.is_lazy_fc(x) for dispatch code that must accept both eager and lazy FCs without try/except ImportError ceremony.

Lazy vector reads with LazyFeatureCollection¶

What you'll see¶

Requirements¶

⚠ Scheduler — set this BEFORE any compute¶

1. Read a bounded slice of Overture Places¶

2. Stay lazy while filtering, keep pushdown benefits¶

3. Materialise a slice back to eager for plotting / exporting¶

Plot the materialised places¶

4. Write back to GeoParquet — the one lazy-native writer¶

5. What the API deliberately does NOT do on lazy FCs¶

Further reading¶

Lazy vector reads with `LazyFeatureCollection`¶