Skip to content

LiDAR#

LiDAR point-cloud I/O, classification, gridding, and analysis. The LAS / LAZ I/O surface soft-imports laspy — install with pip install laspy[lazrs] to enable file read/write. Everything else (gridding, ground filtering, clipping, tree detection) operates on the in-memory LasPoints record without any external dependency beyond NumPy / SciPy.

Module-level functions#

LiDAR point-cloud I/O, gridding, ground filtering, and analysis.

  • :class:LasPoints — in-memory point-cloud record (xyz + intensity + classification + return-number + CRS).
  • :func:read_las / :func:write_las — LAS / LAZ I/O via laspy.
  • :func:grid_lidar_points — bucket raw (x, y, z) arrays into a gridded DEM with min / max / mean / median aggregation.

Reading / writing LAS files requires laspy. Install with pip install laspy[lazrs] to also handle LAZ compression. The gridding helper does not require laspy and operates on raw arrays.

LasPoints dataclass #

In-memory LiDAR point cloud.

Numeric arrays are all parallel — index i selects the i-th point across every field. classification follows the ASPRS LAS standard (2 = ground, 5 = high vegetation, 6 = building, etc.).

Attributes:

Name Type Description
x ndarray

(N,) float64 array of x-coordinates.

y ndarray

(N,) float64 array of y-coordinates.

z ndarray

(N,) float64 array of z-coordinates (elevation).

intensity ndarray

(N,) uint16 array of return intensity, or empty.

classification ndarray

(N,) uint8 array of ASPRS class codes, or empty.

return_number ndarray

(N,) uint8 array of return-number-within-pulse, or empty.

crs object | None

Optional CRS object (whatever laspy.LasHeader.parse_crs returns; typically pyproj.CRS).

Source code in src/digitalrivers/lidar.py
@dataclass
class LasPoints:
    """In-memory LiDAR point cloud.

    Numeric arrays are all parallel — index `i` selects the i-th point
    across every field. `classification` follows the ASPRS LAS standard
    (2 = ground, 5 = high vegetation, 6 = building, etc.).

    Attributes:
        x: `(N,)` float64 array of x-coordinates.
        y: `(N,)` float64 array of y-coordinates.
        z: `(N,)` float64 array of z-coordinates (elevation).
        intensity: `(N,)` uint16 array of return intensity, or empty.
        classification: `(N,)` uint8 array of ASPRS class codes, or empty.
        return_number: `(N,)` uint8 array of return-number-within-pulse,
            or empty.
        crs: Optional CRS object (whatever `laspy.LasHeader.parse_crs`
            returns; typically `pyproj.CRS`).
    """

    x: np.ndarray
    y: np.ndarray
    z: np.ndarray
    intensity: np.ndarray = field(
        default_factory=lambda: np.empty(0, dtype=np.uint16),
    )
    classification: np.ndarray = field(
        default_factory=lambda: np.empty(0, dtype=np.uint8),
    )
    return_number: np.ndarray = field(
        default_factory=lambda: np.empty(0, dtype=np.uint8),
    )
    crs: object | None = None

    def __post_init__(self) -> None:
        n = len(self.x)
        if not (len(self.y) == n and len(self.z) == n):
            raise ValueError(
                f"x/y/z must have the same length; got {len(self.x)}, "
                f"{len(self.y)}, {len(self.z)}"
            )

    def __len__(self) -> int:
        """Number of points in the cloud."""
        return int(self.x.shape[0])

    def subset(self, mask: np.ndarray) -> "LasPoints":
        """Return a new `LasPoints` containing only the points where `mask` is True.

        Args:
            mask: `(N,)` bool array (same length as the point cloud).

        Returns:
            A new `LasPoints` with the selected subset across every field.
        """
        mask = np.asarray(mask, dtype=bool)
        if mask.shape != self.x.shape:
            raise ValueError(
                f"mask shape {mask.shape} != points shape {self.x.shape}"
            )
        kw = {"x": self.x[mask], "y": self.y[mask], "z": self.z[mask]}
        if self.intensity.size:
            kw["intensity"] = self.intensity[mask]
        if self.classification.size:
            kw["classification"] = self.classification[mask]
        if self.return_number.size:
            kw["return_number"] = self.return_number[mask]
        return LasPoints(crs=self.crs, **kw)

subset(mask) #

Return a new LasPoints containing only the points where mask is True.

Parameters:

Name Type Description Default
mask ndarray

(N,) bool array (same length as the point cloud).

required

Returns:

Type Description
'LasPoints'

A new LasPoints with the selected subset across every field.

Source code in src/digitalrivers/lidar.py
def subset(self, mask: np.ndarray) -> "LasPoints":
    """Return a new `LasPoints` containing only the points where `mask` is True.

    Args:
        mask: `(N,)` bool array (same length as the point cloud).

    Returns:
        A new `LasPoints` with the selected subset across every field.
    """
    mask = np.asarray(mask, dtype=bool)
    if mask.shape != self.x.shape:
        raise ValueError(
            f"mask shape {mask.shape} != points shape {self.x.shape}"
        )
    kw = {"x": self.x[mask], "y": self.y[mask], "z": self.z[mask]}
    if self.intensity.size:
        kw["intensity"] = self.intensity[mask]
    if self.classification.size:
        kw["classification"] = self.classification[mask]
    if self.return_number.size:
        kw["return_number"] = self.return_number[mask]
    return LasPoints(crs=self.crs, **kw)

read_las(path) #

Read a LAS or LAZ file into a LasPoints record.

Parameters:

Name Type Description Default
path str

Filesystem path to a .las / .laz file.

required

Returns:

Type Description
LasPoints

LasPoints populated from the file. xyz are scaled+offset by

LasPoints

laspy so values are in the file's CRS units.

Raises:

Type Description
ImportError

If laspy is not installed.

Source code in src/digitalrivers/lidar.py
def read_las(path: str) -> LasPoints:
    """Read a LAS or LAZ file into a `LasPoints` record.

    Args:
        path: Filesystem path to a `.las` / `.laz` file.

    Returns:
        `LasPoints` populated from the file. xyz are scaled+offset by
        `laspy` so values are in the file's CRS units.

    Raises:
        ImportError: If `laspy` is not installed.
    """
    import warnings
    try:
        import laspy  # type: ignore
    except ImportError as exc:  # pragma: no cover — environment-specific
        raise ImportError(_LASPY_HINT) from exc
    f = laspy.read(path)
    # CRS parsing can fail on older LAS headers that pre-date the WKT
    # / VLR conventions laspy expects. Surface that to the caller as a
    # `UserWarning` (not a silent `crs = None`) so a downstream gridding /
    # reprojection step has something to act on.
    try:
        crs = f.header.parse_crs()
    except (ValueError, KeyError, AttributeError) as exc:
        warnings.warn(
            f"Could not parse CRS from LAS header {path!r}: {exc!r}. "
            f"Returning `crs=None`.",
            UserWarning,
            stacklevel=2,
        )
        crs = None
    intensity = (
        np.asarray(f.intensity, dtype=np.uint16)
        if hasattr(f, "intensity")
        else np.empty(0, dtype=np.uint16)
    )
    classification = (
        np.asarray(f.classification, dtype=np.uint8)
        if hasattr(f, "classification")
        else np.empty(0, dtype=np.uint8)
    )
    return_number = (
        np.asarray(f.return_number, dtype=np.uint8)
        if hasattr(f, "return_number")
        else np.empty(0, dtype=np.uint8)
    )
    return LasPoints(
        x=np.asarray(f.x, dtype=np.float64),
        y=np.asarray(f.y, dtype=np.float64),
        z=np.asarray(f.z, dtype=np.float64),
        intensity=intensity,
        classification=classification,
        return_number=return_number,
        crs=crs,
    )

classify_ground(points, *, method='zhang', cell_size=1.0, window_cells=5, slope_threshold=1.0) #

Classify each LiDAR point as ground or non-ground.

Two algorithm families are available:

  • "zhang" (Zhang 2003) — morphological tophat filter on a min-grid DEM. Points whose elevation rises more than slope_threshold above the morphological opening at their cell are non-ground. Fast; works well on relatively flat terrain. Implementation uses a single structuring-element scale rather than the original paper's multi-scale stack.
  • "axelsson" — Axelsson 2000 progressive TIN densification. Not yet implemented; raises NotImplementedError.

Parameters:

Name Type Description Default
points LasPoints

LasPoints cloud to classify.

required
method str

"zhang" (default) or "axelsson".

'zhang'
cell_size float

Cell size in map units for the intermediate min-grid. Smaller values capture fine ground detail at the cost of memory.

1.0
window_cells int

Side length of the structuring element used for the morphological opening (Zhang only). Default 5 (a 5×5 window).

5
slope_threshold float

Elevation threshold above the opening above which a point is classified as non-ground (Zhang only). Default 1.0.

1.0

Returns:

Type Description
ndarray

(N,) uint8 array of ASPRS class codes — 2 (ground) or 1

ndarray

(unclassified / non-ground), parallel to points.

Raises:

Type Description
ValueError

If method is unknown or geometry-arguments are invalid.

NotImplementedError

If method="axelsson" is requested.

Source code in src/digitalrivers/lidar.py
def classify_ground(
    points: LasPoints,
    *,
    method: str = "zhang",
    cell_size: float = 1.0,
    window_cells: int = 5,
    slope_threshold: float = 1.0,
) -> np.ndarray:
    """Classify each LiDAR point as ground or non-ground.

    Two algorithm families are available:

    * **`"zhang"`** (Zhang 2003) — morphological tophat filter on a min-grid
      DEM. Points whose elevation rises more than `slope_threshold` above
      the morphological opening at their cell are non-ground. Fast; works
      well on relatively flat terrain. Implementation uses a single
      structuring-element scale rather than the original paper's
      multi-scale stack.
    * **`"axelsson"`** — Axelsson 2000 progressive TIN densification.
      Not yet implemented; raises NotImplementedError.

    Args:
        points: `LasPoints` cloud to classify.
        method: `"zhang"` (default) or `"axelsson"`.
        cell_size: Cell size in map units for the intermediate min-grid.
            Smaller values capture fine ground detail at the cost of memory.
        window_cells: Side length of the structuring element used for the
            morphological opening (Zhang only). Default 5 (a 5×5 window).
        slope_threshold: Elevation threshold above the opening above which a
            point is classified as non-ground (Zhang only). Default 1.0.

    Returns:
        `(N,)` uint8 array of ASPRS class codes — `2` (ground) or `1`
        (unclassified / non-ground), parallel to `points`.

    Raises:
        ValueError: If `method` is unknown or geometry-arguments are
            invalid.
        NotImplementedError: If `method="axelsson"` is requested.
    """
    if method not in ("zhang", "axelsson"):
        raise ValueError(
            f"method must be 'zhang' or 'axelsson'; got {method!r}"
        )
    if method == "axelsson":
        raise NotImplementedError(
            "Axelsson 2000 TIN-progressive ground filter is not yet "
            "implemented; use method='zhang' for a morphological-tophat "
            "ground filter, or pre-classify the input cloud with PDAL "
            "or LASGround."
        )

    from scipy.ndimage import grey_opening

    if cell_size <= 0:
        raise ValueError(f"cell_size must be positive; got {cell_size!r}")
    if window_cells < 1:
        raise ValueError(
            f"window_cells must be >= 1; got {window_cells!r}"
        )
    if slope_threshold < 0:
        raise ValueError(
            f"slope_threshold must be non-negative; got {slope_threshold!r}"
        )

    xs, ys, zs = points.x, points.y, points.z
    if not len(xs):
        return np.empty(0, dtype=np.uint8)
    x_min, x_max = float(xs.min()), float(xs.max())
    y_min, y_max = float(ys.min()), float(ys.max())
    cols = max(1, int(np.ceil((x_max - x_min) / cell_size)))
    rows = max(1, int(np.ceil((y_max - y_min) / cell_size)))
    col_idx = np.clip(
        ((xs - x_min) / cell_size).astype(np.int64), 0, cols - 1
    )
    row_idx = np.clip(
        ((y_max - ys) / cell_size).astype(np.int64), 0, rows - 1
    )

    # Min-grid: lowest z per cell. Empty cells inherit the sentinel +inf so
    # the opening "spreads" the lowest neighbouring elevation across them.
    min_grid = np.full((rows, cols), np.inf, dtype=np.float64)
    np.minimum.at(min_grid, (row_idx, col_idx), zs)
    finite = np.isfinite(min_grid)
    if not finite.any():
        return np.full(len(xs), 1, dtype=np.uint8)
    # Replace +inf with the global min so the opening kernel has something
    # to work with (it would otherwise propagate inf into adjacent cells).
    z_lo = float(min_grid[finite].min())
    min_grid = np.where(finite, min_grid, z_lo)

    opened = grey_opening(min_grid, size=int(window_cells))

    # Per-point comparison: ground if z - opened_at_cell <= slope_threshold.
    cell_opened = opened[row_idx, col_idx]
    is_ground = (zs - cell_opened) <= slope_threshold
    out = np.where(is_ground, 2, 1).astype(np.uint8)
    return out

detect_trees(chm, *, min_height_m=2.0, radius_fn=None) #

Detect individual tree tops on a canopy height model.

Variable-window local-maxima search: for each candidate cell whose CHM value is >= min_height_m, scan a square window whose half-width scales with the cell's height (radius_fn(h) map units, default _default_tree_radius — ~5% of canopy height). The cell is reported as a tree top iff its CHM value is the maximum in the window.

Parameters:

Name Type Description Default
chm

pyramids Dataset of the canopy height model (typically DSM − DTM, in metres). Must be single-band and projected.

required
min_height_m float

Minimum canopy height (m) for a cell to be a tree-top candidate. Defaults to 2.0.

2.0
radius_fn

Callable mapping height_m -> window half-width in map units. Defaults to _default_tree_radius (Popescu & Wynne 2004 conifer rule of thumb: 0.5 + 0.05 * h).

None

Returns:

Type Description

geopandas.GeoDataFrame of tree-top Point geometries with columns

height_m (canopy height at the top), row, col (raster

indices), and geometry. CRS is set from the CHM's EPSG.

References

Popescu, S. C. & Wynne, R. H. (2004). "Seeing the trees in the forest: Using LIDAR and multispectral data fusion with local filtering and variable window size for estimating tree height." Photogrammetric Engineering & Remote Sensing 70(5): 589–604.

Source code in src/digitalrivers/lidar.py
def detect_trees(
    chm,
    *,
    min_height_m: float = 2.0,
    radius_fn=None,
):
    """Detect individual tree tops on a canopy height model.

    Variable-window local-maxima search: for each candidate cell whose CHM
    value is `>= min_height_m`, scan a square window whose half-width
    scales with the cell's height (`radius_fn(h)` map units, default
    `_default_tree_radius` — ~5% of canopy height). The cell is reported
    as a tree top iff its CHM value is the maximum in the window.

    Args:
        chm: pyramids `Dataset` of the canopy height model (typically
            DSM − DTM, in metres). Must be single-band and projected.
        min_height_m: Minimum canopy height (m) for a cell to be a tree-top
            candidate. Defaults to 2.0.
        radius_fn: Callable mapping `height_m` -> window half-width in map
            units. Defaults to `_default_tree_radius` (Popescu & Wynne 2004
            conifer rule of thumb: `0.5 + 0.05 * h`).

    Returns:
        `geopandas.GeoDataFrame` of tree-top Point geometries with columns
        `height_m` (canopy height at the top), `row`, `col` (raster
        indices), and `geometry`. CRS is set from the CHM's EPSG.

    References:
        Popescu, S. C. & Wynne, R. H. (2004). "Seeing the trees in the
        forest: Using LIDAR and multispectral data fusion with local
        filtering and variable window size for estimating tree height."
        *Photogrammetric Engineering & Remote Sensing* 70(5): 589–604.
    """
    import geopandas as gpd
    from shapely.geometry import Point

    if radius_fn is None:
        radius_fn = _default_tree_radius

    z = chm.read_array().astype(np.float32, copy=False)
    no_val = chm.no_data_value[0] if chm.no_data_value else None
    if no_val is not None:
        z = np.where(z == no_val, np.nan, z)
    cell_size = float(abs(chm.geotransform[1]))

    rows, cols = z.shape
    gt = chm.geotransform
    candidates = np.argwhere(z >= min_height_m)
    tops_r: list[int] = []
    tops_c: list[int] = []
    tops_h: list[float] = []
    for r, c in candidates:
        h = float(z[r, c])
        rad = max(1, int(radius_fn(h) / cell_size))
        r0 = max(0, int(r) - rad)
        r1 = min(rows, int(r) + rad + 1)
        c0 = max(0, int(c) - rad)
        c1 = min(cols, int(c) + rad + 1)
        window = z[r0:r1, c0:c1]
        if not np.isfinite(window).any():
            continue
        win_max = float(np.nanmax(window))
        if h >= win_max:
            tops_r.append(int(r))
            tops_c.append(int(c))
            tops_h.append(h)

    xs = [gt[0] + (c + 0.5) * gt[1] for c in tops_c]
    ys = [gt[3] + (r + 0.5) * gt[5] for r in tops_r]
    geometries = [Point(x, y) for x, y in zip(xs, ys)]
    return gpd.GeoDataFrame(
        {
            "height_m": tops_h,
            "row": tops_r,
            "col": tops_c,
            "geometry": geometries,
        },
        crs=chm.epsg,
    )

clip(points, polygon, *, inverse=False) #

Clip a LasPoints cloud to a polygon (or its complement).

Parameters:

Name Type Description Default
points LasPoints

Input LasPoints.

required
polygon

Shapely Polygon or MultiPolygon in the same CRS as points. All other geometry types are rejected.

required
inverse bool

If False (default), keep only points inside polygon. If True, keep only points OUTSIDE the polygon (i.e. erase).

False

Returns:

Type Description
LasPoints

A new LasPoints containing the surviving subset.

Source code in src/digitalrivers/lidar.py
def clip(
    points: LasPoints,
    polygon,
    *,
    inverse: bool = False,
) -> LasPoints:
    """Clip a `LasPoints` cloud to a polygon (or its complement).

    Args:
        points: Input `LasPoints`.
        polygon: Shapely Polygon or MultiPolygon in the same CRS as `points`.
            All other geometry types are rejected.
        inverse: If False (default), keep only points inside `polygon`. If
            True, keep only points OUTSIDE the polygon (i.e. erase).

    Returns:
        A new `LasPoints` containing the surviving subset.
    """
    import shapely
    inside = shapely.contains_xy(polygon, points.x, points.y)
    keep = inside if not inverse else ~inside
    return points.subset(keep)

merge(*pointclouds) #

Concatenate two or more LasPoints into a single cloud.

Numeric arrays are stacked field-by-field. Optional fields (intensity / classification / return_number) are preserved only when every input carries them — otherwise the field on the output is empty (size 0). The CRS is taken from the first input.

Parameters:

Name Type Description Default
*pointclouds LasPoints

Two or more LasPoints to merge.

()

Returns:

Type Description
LasPoints

A new LasPoints containing the concatenation.

Raises:

Type Description
ValueError

If fewer than one cloud is supplied.

Source code in src/digitalrivers/lidar.py
def merge(*pointclouds: LasPoints) -> LasPoints:
    """Concatenate two or more `LasPoints` into a single cloud.

    Numeric arrays are stacked field-by-field. Optional fields (intensity /
    classification / return_number) are preserved only when every input
    carries them — otherwise the field on the output is empty (size 0).
    The CRS is taken from the first input.

    Args:
        *pointclouds: Two or more `LasPoints` to merge.

    Returns:
        A new `LasPoints` containing the concatenation.

    Raises:
        ValueError: If fewer than one cloud is supplied.
    """
    if not pointclouds:
        raise ValueError("merge requires at least one point cloud")
    x = np.concatenate([p.x for p in pointclouds])
    y = np.concatenate([p.y for p in pointclouds])
    z = np.concatenate([p.z for p in pointclouds])
    kw: dict = {}
    if all(p.intensity.size for p in pointclouds):
        kw["intensity"] = np.concatenate([p.intensity for p in pointclouds])
    if all(p.classification.size for p in pointclouds):
        kw["classification"] = np.concatenate(
            [p.classification for p in pointclouds]
        )
    if all(p.return_number.size for p in pointclouds):
        kw["return_number"] = np.concatenate(
            [p.return_number for p in pointclouds]
        )
    return LasPoints(x=x, y=y, z=z, crs=pointclouds[0].crs, **kw)

filter_classes(points, classes) #

Keep only points whose ASPRS classification is in classes.

Standard ASPRS codes include 2 (ground), 3 (low vegetation), 4 (medium vegetation), 5 (high vegetation), 6 (building), 9 (water).

Parameters:

Name Type Description Default
points LasPoints

Input LasPoints. Must carry a populated classification array (i.e. read from a LAS / LAZ file).

required
classes set[int] | list[int] | tuple[int, ...]

Iterable of integer class codes to keep.

required

Returns:

Type Description
LasPoints

A new LasPoints containing only the matching subset.

Raises:

Type Description
ValueError

If points.classification is empty (the cloud carries no class codes).

Source code in src/digitalrivers/lidar.py
def filter_classes(
    points: LasPoints,
    classes: set[int] | list[int] | tuple[int, ...],
) -> LasPoints:
    """Keep only points whose ASPRS classification is in `classes`.

    Standard ASPRS codes include `2` (ground), `3` (low vegetation), `4`
    (medium vegetation), `5` (high vegetation), `6` (building), `9` (water).

    Args:
        points: Input `LasPoints`. Must carry a populated
            `classification` array (i.e. read from a LAS / LAZ file).
        classes: Iterable of integer class codes to keep.

    Returns:
        A new `LasPoints` containing only the matching subset.

    Raises:
        ValueError: If `points.classification` is empty (the cloud carries
            no class codes).
    """
    if not points.classification.size:
        raise ValueError(
            "points carries no classification data; nothing to filter"
        )
    keep = np.isin(points.classification, list(classes))
    return points.subset(keep)

write_las(points, path, *, point_format=6, version='1.4') #

Write a LasPoints cloud to a LAS or LAZ file.

The file extension determines compression: .laz uses LAZ (requires lazrs), .las is uncompressed.

Parameters:

Name Type Description Default
points LasPoints

LasPoints to write.

required
path str

Output filesystem path.

required
point_format int

ASPRS LAS point format (default 6 — supports GPS time and high-precision returns; pick 0 for legacy compatibility).

6
version str

LAS version string (default "1.4").

'1.4'

Raises:

Type Description
ImportError

If laspy is not installed.

Source code in src/digitalrivers/lidar.py
def write_las(
    points: LasPoints,
    path: str,
    *,
    point_format: int = 6,
    version: str = "1.4",
) -> None:
    """Write a `LasPoints` cloud to a LAS or LAZ file.

    The file extension determines compression: `.laz` uses LAZ
    (requires `lazrs`), `.las` is uncompressed.

    Args:
        points: `LasPoints` to write.
        path: Output filesystem path.
        point_format: ASPRS LAS point format (default 6 — supports GPS time
            and high-precision returns; pick 0 for legacy compatibility).
        version: LAS version string (default `"1.4"`).

    Raises:
        ImportError: If `laspy` is not installed.
    """
    try:
        import laspy  # type: ignore
    except ImportError as exc:  # pragma: no cover — environment-specific
        raise ImportError(_LASPY_HINT) from exc
    header = laspy.LasHeader(point_format=point_format, version=version)
    out = laspy.LasData(header)
    out.x = points.x
    out.y = points.y
    out.z = points.z
    if points.intensity.size:
        out.intensity = points.intensity
    if points.classification.size:
        out.classification = points.classification
    if points.return_number.size:
        out.return_number = points.return_number
    out.write(path)

grid_lidar_points(xs, ys, zs, cell_size, bounds=None, aggregate='min', epsg=4326, *, idw_k=8, idw_power=2.0, rbf_kernel='thin_plate_spline', rbf_smoothing=0.0) #

Grid a LiDAR point cloud to a DEM.

Pragmatic LiDAR-to-DEM step that operates on raw (x, y, z) arrays. Useful when the caller has read LAS / LAZ externally (via read_las) and wants a gridded surface.

The aggregate parameter selects either a block-aggregation method or a spatial-interpolation method:

Block aggregation (one value per cell, based on points that land inside the cell):

  • "min" (default) — canonical bare-earth choice for first-return LiDAR.
  • "max" — canopy / DSM choice.
  • "mean" / "median" — smoothed surfaces.
  • "count" — point-density raster.

Spatial interpolation (one value per cell centre, computed from nearby points regardless of cell membership):

  • "idw" — inverse-distance-weighted mean of the K nearest points.
  • "nn" — nearest-neighbour assignment.
  • "tin" — barycentric interpolation on the Delaunay triangulation.
  • "rbf" — radial-basis-function interpolation (scipy.interpolate. RBFInterpolator).

Parameters:

Name Type Description Default
xs / ys / zs

1-D arrays of point coordinates.

required
cell_size float

output cell side length in map units (must match the CRS).

required
bounds

(x_min, y_min, x_max, y_max) to clip the grid to. If None, the input points' bounding box is used.

None
aggregate str

One of "min", "max", "mean", "median", "count", "idw", "nn", "tin", "rbf".

'min'
epsg int

EPSG code of the input coordinates.

4326
idw_k int

Neighbour count for aggregate="idw". Defaults to 8.

8
idw_power float

Distance exponent for IDW weights (1 / d**power). Defaults to 2.0.

2.0
rbf_kernel str

Kernel name passed to scipy.interpolate.RBFInterpolator for aggregate="rbf". Defaults to "thin_plate_spline".

'thin_plate_spline'
rbf_smoothing float

Smoothing parameter for the RBF kernel. Defaults to 0.0 (exact interpolation).

0.0

Returns:

Type Description

A pyramids Dataset of the gridded surface.

Raises:

Type Description
ValueError

For mismatched input lengths or unknown aggregate.

Examples:

  • Bucket four points into a 2x1 grid with min aggregation:

    import numpy as np from digitalrivers.lidar import grid_lidar_points xs = np.array([0.1, 0.2, 1.1]) ys = np.array([0.1, 0.2, 0.1]) zs = np.array([5.0, 2.0, 4.0]) ds = grid_lidar_points( ... xs, ys, zs, cell_size=1.0, bounds=(0.0, 0.0, 2.0, 1.0), ... aggregate="min", epsg=3857, ... ) ds.read_array().tolist() [[2.0, 4.0]]

  • Mean aggregation averages every point that lands in a cell:

    import numpy as np from digitalrivers.lidar import grid_lidar_points xs = np.array([0.1, 0.2]) ys = np.array([0.1, 0.2]) zs = np.array([4.0, 6.0]) ds = grid_lidar_points( ... xs, ys, zs, cell_size=1.0, bounds=(0.0, 0.0, 1.0, 1.0), ... aggregate="mean", epsg=3857, ... ) float(ds.read_array()[0, 0]) 5.0

  • Empty cells receive the dataset's no-data sentinel (-9999.0):

    import numpy as np from digitalrivers.lidar import grid_lidar_points ds = grid_lidar_points( ... np.array([0.5]), np.array([0.5]), np.array([3.0]), ... cell_size=1.0, bounds=(0.0, 0.0, 2.0, 2.0), ... aggregate="min", ... ) float(ds.no_data_value[0]) -9999.0 int((ds.read_array() == -9999.0).sum()) 3

Source code in src/digitalrivers/lidar.py
def grid_lidar_points(
    xs,
    ys,
    zs,
    cell_size: float,
    bounds=None,
    aggregate: str = "min",
    epsg: int = 4326,
    *,
    idw_k: int = 8,
    idw_power: float = 2.0,
    rbf_kernel: str = "thin_plate_spline",
    rbf_smoothing: float = 0.0,
):
    """Grid a LiDAR point cloud to a DEM.

    Pragmatic LiDAR-to-DEM step that operates on raw `(x, y, z)` arrays.
    Useful when the caller has read LAS / LAZ externally (via `read_las`)
    and wants a gridded surface.

    The `aggregate` parameter selects either a block-aggregation method or
    a spatial-interpolation method:

    **Block aggregation** (one value per cell, based on points that land
    inside the cell):

    * `"min"` (default) — canonical bare-earth choice for first-return LiDAR.
    * `"max"` — canopy / DSM choice.
    * `"mean"` / `"median"` — smoothed surfaces.
    * `"count"` — point-density raster.

    **Spatial interpolation** (one value per cell centre, computed from
    nearby points regardless of cell membership):

    * `"idw"` — inverse-distance-weighted mean of the K nearest points.
    * `"nn"` — nearest-neighbour assignment.
    * `"tin"` — barycentric interpolation on the Delaunay triangulation.
    * `"rbf"` — radial-basis-function interpolation (`scipy.interpolate.
      RBFInterpolator`).

    Args:
        xs / ys / zs: 1-D arrays of point coordinates.
        cell_size: output cell side length in map units (must match the
            CRS).
        bounds: `(x_min, y_min, x_max, y_max)` to clip the grid to. If
            `None`, the input points' bounding box is used.
        aggregate: One of `"min"`, `"max"`, `"mean"`, `"median"`,
            `"count"`, `"idw"`, `"nn"`, `"tin"`, `"rbf"`.
        epsg: EPSG code of the input coordinates.
        idw_k: Neighbour count for `aggregate="idw"`. Defaults to 8.
        idw_power: Distance exponent for IDW weights `(1 / d**power)`.
            Defaults to 2.0.
        rbf_kernel: Kernel name passed to `scipy.interpolate.RBFInterpolator`
            for `aggregate="rbf"`. Defaults to `"thin_plate_spline"`.
        rbf_smoothing: Smoothing parameter for the RBF kernel. Defaults
            to 0.0 (exact interpolation).

    Returns:
        A pyramids `Dataset` of the gridded surface.

    Raises:
        ValueError: For mismatched input lengths or unknown `aggregate`.

    Examples:
        - Bucket four points into a 2x1 grid with min aggregation:

            >>> import numpy as np
            >>> from digitalrivers.lidar import grid_lidar_points
            >>> xs = np.array([0.1, 0.2, 1.1])
            >>> ys = np.array([0.1, 0.2, 0.1])
            >>> zs = np.array([5.0, 2.0, 4.0])
            >>> ds = grid_lidar_points(
            ...     xs, ys, zs, cell_size=1.0, bounds=(0.0, 0.0, 2.0, 1.0),
            ...     aggregate="min", epsg=3857,
            ... )
            >>> ds.read_array().tolist()
            [[2.0, 4.0]]

        - Mean aggregation averages every point that lands in a cell:

            >>> import numpy as np
            >>> from digitalrivers.lidar import grid_lidar_points
            >>> xs = np.array([0.1, 0.2])
            >>> ys = np.array([0.1, 0.2])
            >>> zs = np.array([4.0, 6.0])
            >>> ds = grid_lidar_points(
            ...     xs, ys, zs, cell_size=1.0, bounds=(0.0, 0.0, 1.0, 1.0),
            ...     aggregate="mean", epsg=3857,
            ... )
            >>> float(ds.read_array()[0, 0])
            5.0

        - Empty cells receive the dataset's no-data sentinel (-9999.0):

            >>> import numpy as np
            >>> from digitalrivers.lidar import grid_lidar_points
            >>> ds = grid_lidar_points(
            ...     np.array([0.5]), np.array([0.5]), np.array([3.0]),
            ...     cell_size=1.0, bounds=(0.0, 0.0, 2.0, 2.0),
            ...     aggregate="min",
            ... )
            >>> float(ds.no_data_value[0])
            -9999.0
            >>> int((ds.read_array() == -9999.0).sum())
            3
    """
    import numpy as np
    from pyramids.dataset import Dataset

    xs = np.asarray(xs, dtype=np.float64)
    ys = np.asarray(ys, dtype=np.float64)
    zs = np.asarray(zs, dtype=np.float64)
    if not (len(xs) == len(ys) == len(zs)):
        raise ValueError(
            f"xs, ys, zs must have the same length; got {len(xs)}, "
            f"{len(ys)}, {len(zs)}"
        )
    valid_aggregates = {
        "min", "max", "mean", "median", "count",
        "idw", "nn", "tin", "rbf",
    }
    if aggregate not in valid_aggregates:
        raise ValueError(
            f"aggregate must be one of {sorted(valid_aggregates)}; "
            f"got {aggregate!r}"
        )
    if bounds is None:
        x_min, y_min = float(xs.min()), float(ys.min())
        x_max, y_max = float(xs.max()), float(ys.max())
    else:
        x_min, y_min, x_max, y_max = bounds

    cols = int(np.ceil((x_max - x_min) / cell_size))
    rows = int(np.ceil((y_max - y_min) / cell_size))

    nodata = -9999.0

    # Spatial-interpolation paths evaluate at cell centres and exit early —
    # they don't use the per-cell binning of the block-aggregation paths.
    if aggregate in ("idw", "nn", "tin", "rbf"):
        # Cell-centre coordinates in the output grid.
        cx = x_min + (np.arange(cols) + 0.5) * cell_size
        cy = y_max - (np.arange(rows) + 0.5) * cell_size
        grid_x, grid_y = np.meshgrid(cx, cy)
        xy_pts = np.column_stack([xs, ys])
        target = np.column_stack([grid_x.ravel(), grid_y.ravel()])
        if aggregate == "idw":
            from scipy.spatial import cKDTree
            tree = cKDTree(xy_pts)
            k = min(int(idw_k), len(xs))
            dists, idxs = tree.query(target, k=k)
            if k == 1:
                dists = dists[:, None]
                idxs = idxs[:, None]
            with np.errstate(divide="ignore"):
                # Exact-hit points pin the result; weight them as infinity
                # so they dominate the weighted average. Replace 1/0 below.
                weights = 1.0 / np.power(dists, idw_power)
            # Cells with an exact hit (distance 0) take that point's z.
            exact = (dists == 0).any(axis=1)
            with np.errstate(invalid="ignore"):
                z_interp = (weights * zs[idxs]).sum(axis=1) / weights.sum(axis=1)
            if exact.any():
                # For exact hits, take the z at the matching nearest point.
                z_interp[exact] = zs[idxs[exact, 0]]
            out = z_interp.reshape(rows, cols)
        elif aggregate == "nn":
            from scipy.spatial import cKDTree
            tree = cKDTree(xy_pts)
            _, idxs = tree.query(target, k=1)
            out = zs[idxs].reshape(rows, cols)
        elif aggregate == "tin":
            from scipy.interpolate import LinearNDInterpolator
            interp = LinearNDInterpolator(xy_pts, zs, fill_value=nodata)
            out = interp(target).reshape(rows, cols)
        else:  # rbf
            from scipy.interpolate import RBFInterpolator
            interp = RBFInterpolator(
                xy_pts, zs, kernel=rbf_kernel, smoothing=rbf_smoothing,
            )
            out = interp(target).reshape(rows, cols)
        out = np.where(np.isfinite(out), out, nodata).astype(
            np.float32, copy=False,
        )
        geo = (x_min, cell_size, 0.0, y_max, 0.0, -cell_size)
        return Dataset.create_from_array(
            out, geo=geo, epsg=epsg, no_data_value=nodata,
        )

    col_idx = np.clip(((xs - x_min) / cell_size).astype(np.int64), 0, cols - 1)
    row_idx = np.clip(
        ((y_max - ys) / cell_size).astype(np.int64), 0, rows - 1
    )

    # `min` / `max` use `np.minimum.at` / `np.maximum.at`;
    # `mean` uses `np.add.at` for an O(N_points) reduction;
    # `count` uses `np.add.at` with weight 1 — same kernel as the mean
    # denominator; `median` still requires per-cell bucketing because
    # there is no closed-form running median in NumPy.
    if aggregate == "min":
        out = np.full((rows, cols), np.inf, dtype=np.float64)
        np.minimum.at(out, (row_idx, col_idx), zs)
        out[~np.isfinite(out)] = nodata
    elif aggregate == "max":
        out = np.full((rows, cols), -np.inf, dtype=np.float64)
        np.maximum.at(out, (row_idx, col_idx), zs)
        out[~np.isfinite(out)] = nodata
    elif aggregate == "mean":
        sums = np.zeros((rows, cols), dtype=np.float64)
        counts = np.zeros((rows, cols), dtype=np.int64)
        np.add.at(sums, (row_idx, col_idx), zs)
        np.add.at(counts, (row_idx, col_idx), 1)
        with np.errstate(invalid="ignore", divide="ignore"):
            out = np.where(counts > 0, sums / counts, nodata)
    elif aggregate == "count":
        counts = np.zeros((rows, cols), dtype=np.int64)
        np.add.at(counts, (row_idx, col_idx), 1)
        out = counts.astype(np.float64)
    else:  # median — per-cell bucketing
        buckets: dict[tuple[int, int], list[float]] = {}
        for r, c, z in zip(row_idx, col_idx, zs):
            buckets.setdefault((int(r), int(c)), []).append(float(z))
        out = np.full((rows, cols), nodata, dtype=np.float64)
        for (r, c), vals in buckets.items():
            out[r, c] = float(np.median(np.asarray(vals, dtype=np.float64)))

    geo = (x_min, cell_size, 0.0, y_max, 0.0, -cell_size)
    return Dataset.create_from_array(
        out.astype(np.float32, copy=False),
        geo=geo, epsg=epsg, no_data_value=nodata,
    )

Surface map#

Section Functions / classes
Record class LasPoints (xyz + intensity + classification + return_number + crs)
I/O — W-15 read_las(path), write_las(points, path, point_format=6, version="1.4")
Ground filter — W-16 classify_ground(points, method="zhang"/"axelsson", ...) (Zhang 2003 tophat ships; Axelsson 2000 TIN-progressive deferred)
Gridding — W-17 / P34 grid_lidar_points(xs, ys, zs, cell_size, aggregate=...)min / max / mean / median / count block aggregation plus idw / nn / tin / rbf interpolation
Vector ops — W-18 clip(points, polygon, inverse=False), merge(*pointclouds), filter_classes(points, classes)
Forestry — W-19 detect_trees(chm, min_height_m=2.0, radius_fn=...) — variable-window local-maxima on a canopy height model