Lazy `FeatureCollection` — complete cookbook¶

FeatureCollection.read_file(..., backend='dask') and FeatureCollection.read_parquet(..., backend='dask') return a LazyFeatureCollection — a dask_geopandas.GeoDataFrame subclass that satisfies the LazySpatialObject protocol. Every partition-aware op (to_crs, clip, sjoin, spatial_shuffle) runs lazily; materialise with .compute() when you need eager rows.

This notebook uses local test data — no cloud hits.

What you'll see¶

Backend detection — has_lazy_backend, is_lazy_fc, the PEP-562 __getattr__ guard.
Building a lazy FC from a GeoJSON + a GeoParquet file.
Partition-aware ops: to_crs, clip, spatial_shuffle.
spatial_shuffle → sjoin partition pruning.
compute() vs persist() vs compute_total_bounds.
Writing back with to_parquet (partitioned directory).
pyramids.configure_lazy_vector — scheduler + target partition size.

Requirements¶

pip install 'pyramids-gis[parquet]'

Setup¶

In [1]:

Copied!

%matplotlib inline

from pathlib import Path

import numpy as np

DATA = (Path('..') / '..' / '..' / 'examples' / 'data').resolve()
DATA.is_dir()
%matplotlib inline

from pathlib import Path

import numpy as np

DATA = (Path('..') / '..' / '..' / 'examples' / 'data').resolve()
DATA.is_dir()

Out[1]:

True

1. Backend detection¶

On a minimal install without the [parquet] extra, from pyramids.feature import LazyFeatureCollection raises an ImportError with an actionable install hint. has_lazy_backend() is the cheap check; is_lazy_fc(obj) is the dispatch helper.

In [2]:

Copied!

from pyramids.feature import has_lazy_backend, is_lazy_fc

has_lazy_backend()
from pyramids.feature import has_lazy_backend, is_lazy_fc

has_lazy_backend()

2026-07-11 15:56:14 | INFO | pyramids.base.config | Logging is configured.

Out[2]:

True

In [3]:

Copied!





# On minimal installs, the import raises — guard with
# try/except ImportError, NOT hasattr (Py3 hasattr only
# catches AttributeError).
try:
    from pyramids.feature import LazyFeatureCollection

    backend_ok = True
except ImportError:
    LazyFeatureCollection = None
    backend_ok = False
backend_ok
# On minimal installs, the import raises — guard with
# try/except ImportError, NOT hasattr (Py3 hasattr only
# catches AttributeError).
try:
    from pyramids.feature import LazyFeatureCollection

    backend_ok = True
except ImportError:
    LazyFeatureCollection = None
    backend_ok = False
backend_ok

Out[3]:

True

2. Lazy reads — GeoJSON and GeoParquet¶

read_file(backend='dask') handles vector formats dask-geopandas can chunk (GeoJSON / Shapefile / GeoPackage — all via row counts). read_parquet(backend='dask') uses pyarrow's row-group splits so pushdown filters (filters=, columns=, split_row_groups=) deliver true I/O savings.

In [4]:

Copied!





from pyramids.feature import FeatureCollection

lfc = FeatureCollection.read_file(
    DATA / 'coello-gauges.geojson',
    backend='dask',
    npartitions=2,
)
type(lfc).__name__, lfc.npartitions
from pyramids.feature import FeatureCollection

lfc = FeatureCollection.read_file(
    DATA / 'coello-gauges.geojson',
    backend='dask',
    npartitions=2,
)
type(lfc).__name__, lfc.npartitions

Out[4]:

('LazyFeatureCollection', 2)

In [5]:

Copied!

# is_lazy_fc is the safe dispatch helper.
is_lazy_fc(lfc)
# is_lazy_fc is the safe dispatch helper.
is_lazy_fc(lfc)

Out[5]:

True

In [6]:

Copied!





# Round-trip to GeoParquet and read it back lazily.
import tempfile

workdir = Path(tempfile.mkdtemp(prefix='pyramids-lazy-fc-'))
pq = workdir / 'gauges.parquet'
lfc.compute().to_parquet(pq)

# backend='dask' returns LazyFeatureCollection;
# backend='pandas' (default) returns eager FeatureCollection.
lfc_pq = FeatureCollection.read_parquet(pq, backend='dask')
type(lfc_pq).__name__, lfc_pq.npartitions
# Round-trip to GeoParquet and read it back lazily.
import tempfile

workdir = Path(tempfile.mkdtemp(prefix='pyramids-lazy-fc-'))
pq = workdir / 'gauges.parquet'
lfc.compute().to_parquet(pq)

# backend='dask' returns LazyFeatureCollection;
# backend='pandas' (default) returns eager FeatureCollection.
lfc_pq = FeatureCollection.read_parquet(pq, backend='dask')
type(lfc_pq).__name__, lfc_pq.npartitions

Out[6]:

('LazyFeatureCollection', 1)

3. Partition-aware ops¶

to_crs, clip, spatial_shuffle, and any inherited dask_geopandas.GeoDataFrame method runs lazily. pyramids-specific helpers (extract_vertices, rasterize_with_col, with_coordinates, with_centroid, center_points) require .compute() first.

In [7]:

Copied!





# Lazy CRS reproject. The __getattribute__ rebrand ensures every
# inherited dask-geopandas op returns LazyFeatureCollection, so
# pyramids-specific helpers (epsg, compute_total_bounds, is_lazy_fc)
# remain available after .to_crs / .clip / .copy / .drop_duplicates.
projected = lfc.to_crs(4326)
type(projected).__name__, projected.epsg
# Lazy CRS reproject. The __getattribute__ rebrand ensures every
# inherited dask-geopandas op returns LazyFeatureCollection, so
# pyramids-specific helpers (epsg, compute_total_bounds, is_lazy_fc)
# remain available after .to_crs / .clip / .copy / .drop_duplicates.
projected = lfc.to_crs(4326)
type(projected).__name__, projected.epsg

Out[7]:

('LazyFeatureCollection', 4326)

In [8]:

Copied!





# Lazy clip by a bbox covering the data extent, in the projected CRS.
import geopandas as gpd
from shapely.geometry import box

xmin, ymin, xmax, ymax = projected.compute_total_bounds()
bbox = gpd.GeoDataFrame(
    geometry=[box(xmin - 0.01, ymin - 0.01, xmax + 0.01, ymax + 0.01)],
    crs='EPSG:4326',
)
clipped = projected.clip(bbox)
type(clipped).__name__, len(clipped.compute())
# Lazy clip by a bbox covering the data extent, in the projected CRS.
import geopandas as gpd
from shapely.geometry import box

xmin, ymin, xmax, ymax = projected.compute_total_bounds()
bbox = gpd.GeoDataFrame(
    geometry=[box(xmin - 0.01, ymin - 0.01, xmax + 0.01, ymax + 0.01)],
    crs='EPSG:4326',
)
clipped = projected.clip(bbox)
type(clipped).__name__, len(clipped.compute())

Out[8]:

('LazyFeatureCollection', 6)

4. `spatial_shuffle` → `sjoin` pruning¶

The biggest speedup from going lazy comes from partition-pruned sjoin — each partition has a bounding box, and dask drops partition pairs that can't intersect before dispatching work. spatial_shuffle populates the spatial_partitions attribute that makes pruning possible.

In [9]:

Copied!





# Read both sides lazily and reproject — with the rebrand hook,
# both .to_crs returns stay LazyFeatureCollection.
polys = FeatureCollection.read_file(
    DATA / 'coello_polygons.geojson',
    backend='dask',
    npartitions=2,
).to_crs(4326)
gauges = lfc.to_crs(4326)
(type(gauges).__name__, gauges.epsg), (type(polys).__name__, polys.epsg)
# Read both sides lazily and reproject — with the rebrand hook,
# both .to_crs returns stay LazyFeatureCollection.
polys = FeatureCollection.read_file(
    DATA / 'coello_polygons.geojson',
    backend='dask',
    npartitions=2,
).to_crs(4326)
gauges = lfc.to_crs(4326)
(type(gauges).__name__, gauges.epsg), (type(polys).__name__, polys.epsg)

Out[9]:

(('LazyFeatureCollection', 4326), ('LazyFeatureCollection', 4326))

In [10]:

Copied!





# spatial_shuffle — one-time cost, amortised across subsequent sjoins.
gauges_shuffled = gauges.spatial_shuffle(by='hilbert')
polys_shuffled = polys.spatial_shuffle(by='hilbert')
gauges_shuffled.spatial_partitions is not None, polys_shuffled.spatial_partitions is not None
# spatial_shuffle — one-time cost, amortised across subsequent sjoins.
gauges_shuffled = gauges.spatial_shuffle(by='hilbert')
polys_shuffled = polys.spatial_shuffle(by='hilbert')
gauges_shuffled.spatial_partitions is not None, polys_shuffled.spatial_partitions is not None

Out[10]:

(True, True)

In [11]:

Copied!





# Partition-pruned sjoin — lazy.
joined = gauges_shuffled.sjoin(
    polys_shuffled,
    how='inner',
    predicate='intersects',
)
type(joined).__name__, joined.npartitions
# Partition-pruned sjoin — lazy.
joined = gauges_shuffled.sjoin(
    polys_shuffled,
    how='inner',
    predicate='intersects',
)
type(joined).__name__, joined.npartitions

Out[11]:

('LazyFeatureCollection', 2)

5. `compute()` vs `persist()` vs `compute_total_bounds`¶

.compute() — materialise to an eager FeatureCollection (leaves the lazy domain).
.persist() — materialise the graph into worker memory but keep the lazy wrapper.
compute_total_bounds() — one-line helper for the lazy total_bounds reduction.

In [12]:

Copied!

# .compute() returns an eager FeatureCollection.
eager = lfc.compute()
type(eager).__name__, len(eager)
# .compute() returns an eager FeatureCollection.
eager = lfc.compute()
type(eager).__name__, len(eager)

Out[12]:

('FeatureCollection', 6)

Visualise the materialised gauges¶

Once .compute() leaves the lazy domain we have an eager FeatureCollection, which plots directly — the Coello gauge points in their native CRS.

In [13]:

Copied!

eager.plot()
eager.plot()

Out[13]:

<Axes: >

No description has been provided for this image

In [14]:

Copied!

# .persist() keeps laziness but warms the graph.
persisted = lfc.persist()
type(persisted).__name__, is_lazy_fc(persisted)
# .persist() keeps laziness but warms the graph.
persisted = lfc.persist()
type(persisted).__name__, is_lazy_fc(persisted)

Out[14]:

('LazyFeatureCollection', True)

In [15]:

Copied!





# total_bounds is a dask Scalar on the lazy FC. The explicit
# helper returns the 4-float numpy array directly.
xmin, ymin, xmax, ymax = lfc.compute_total_bounds()
(xmin, ymin, xmax, ymax)
# total_bounds is a dask Scalar on the lazy FC. The explicit
# helper returns the 4-float numpy array directly.
xmin, ymin, xmax, ymax = lfc.compute_total_bounds()
(xmin, ymin, xmax, ymax)

Out[15]:

(np.float64(443847.5736),
 np.float64(478045.572),
 np.float64(487292.5152),
 np.float64(503143.3264))

6. `to_parquet` — partitioned directory write¶

LazyFeatureCollection.to_parquet(path) is the only lazy-native write. It writes a partitioned directory of part.N.parquet files and always blocks until every partition is materialised — compute=False is rejected to keep the pyramids "to_* always writes" invariant.

In [16]:

Copied!

out_dir = workdir / 'out.parquet'
lfc.to_parquet(out_dir)
sorted(p.name for p in out_dir.iterdir())[:5]
out_dir = workdir / 'out.parquet'
lfc.to_parquet(out_dir)
sorted(p.name for p in out_dir.iterdir())[:5]

Out[16]:

['part.0.parquet', 'part.1.parquet']

In [17]:

Copied!

# Reopen the directory as a new lazy FC.
reopened = FeatureCollection.read_parquet(out_dir, backend='dask')
reopened.npartitions, len(reopened.compute())
# Reopen the directory as a new lazy FC.
reopened = FeatureCollection.read_parquet(out_dir, backend='dask')
reopened.npartitions, len(reopened.compute())

Out[17]:

(2, 6)

7. `pyramids.configure_lazy_vector` — scheduler + partition size¶

Shapely holds the GIL, so the default threads scheduler serialises vector ops to one core. Flip it globally with configure_lazy_vector(scheduler='processes'). Raise target_bytes_per_partition if you have more worker RAM.

In [18]:

Copied!





from pyramids import configure_lazy_vector

applied = configure_lazy_vector(
    scheduler='synchronous',
    target_bytes_per_partition=64 * 1024 * 1024,
)
applied
from pyramids import configure_lazy_vector

applied = configure_lazy_vector(
    scheduler='synchronous',
    target_bytes_per_partition=64 * 1024 * 1024,
)
applied

Out[18]:

{'scheduler': 'synchronous', 'target_bytes_per_partition': 67108864}

Closing notes¶

The lazy FC has no lazy to_file (OGR) path — call .compute().to_file(path) to materialise first.
Dataset.zonal_stats(lazy_fc) is not yet supported — call .compute() first (tracked as a follow-on).
For a real Overture Maps walkthrough, see dask-lazy-features.ipynb.

Lazy FeatureCollection — complete cookbook¶