Skip to content

Feature Subpackage#

The pyramids.feature subpackage is the vector-data counterpart of pyramids.dataset. It ships a single user-facing class, FeatureCollection, plus two helper modules with pure functions for geometry manipulation and CRS handling.

Module Layout#

Hold "Ctrl" to enable pan & zoom
classDiagram
    class GeoDataFrame {
        <<geopandas>>
    }

    class FeatureCollection {
        +from_features(data, crs)
        +from_records(records, orient, geometry, crs)
        +iter_features(path, layer, bbox, chunksize, tile_strategy, as_dict)
        +read_file(path, layer, bbox, columns, where)
        +read_parquet(path, columns, bbox)
        +to_parquet(path, compression)
        +to_file(path, driver, layer, mode, creation_options)
        +list_layers(path)
        +list_layers_cache_clear()
        +schema
        +epsg
        +top_left_corner
        +column
        +with_coordinates()
        +with_centroid()
        +concat(other)
        +plot(column, basemap, **kwargs)
        +create_polygon(coords)
        +polygon_wkt(coords)
        +create_points(coords)
        +point_collection(coords, crs)
        +get_epsg_from_prj(prj)
        +reproject_coordinates(x, y, from_crs, to_crs, precision)
        +__enter__()
        +__exit__(exc_type, exc, tb)
        +close()
    }

    class geometry {
        <<module>>
        +create_polygon(coords)
        +polygon_wkt(coords)
        +create_points(coords)
        +point_collection(coords, crs)
        +get_coords(row, geom_col, coord_type)
        +get_xy_coords(geometry, coord_type)
        +get_point_coords(geometry, coord_type)
        +get_line_coords(geometry, coord_type)
        +get_poly_coords(geometry, coord_type)
        +explode_gdf(gdf, geometry)
        +multi_geom_handler(multi_geometry, coord_type, geom_type)
        +geometry_collection_coords(geom, coord_type)
    }

    class crs {
        <<module>>
        +create_sr_from_proj(prj, string_type)
        +get_epsg_from_prj(prj)
        +reproject_coordinates(x, y, from_crs, to_crs, precision)
    }

    class _ogr {
        <<private>>
        +gdf_to_datasource(gdf)
        +datasource_to_gdf(ds)
    }

    GeoDataFrame <|-- FeatureCollection
    FeatureCollection ..> geometry : delegates
    FeatureCollection ..> crs : delegates
    FeatureCollection ..> _ogr : "OGR bridge\n(internal)"
  • FeatureCollection — the public class, a direct subclass of geopandas.GeoDataFrame.
  • geometry — shape factories and coordinate-extraction helpers.
  • crs — CRS / EPSG / reprojection helpers.
  • _ogr — private OGR bridge (OGR DataSource never leaves the subpackage).

When to reach for which#

Task Entry point
Read a vector file (Shapefile / GeoJSON / GPKG / Parquet / zipped / cloud) FeatureCollection.read_file / read_parquet
Stream a large file in chunks FeatureCollection.iter_features
Build from Python data (records or columnar dict) FeatureCollection.from_records
Wrap an existing GeoDataFrame FeatureCollection(gdf) or FeatureCollection.from_features(gdf)
Inspect layers / schema without reading FeatureCollection.list_layers, .schema
Attach per-vertex or centroid columns .with_coordinates(), .with_centroid()
Concatenate two FCs safely (CRS-checked) .concat(other)
Build raw geometries pyramids.feature.geometry.create_polygon / create_points
Reproject coordinate arrays pyramids.base.crs.reproject_coordinates

Lazy / Dask reads#

For files too large to load eagerly — multi-GB GeoParquet, cloud-hosted vector tables, planet-scale datasets like Overture Maps — pyramids offers a dask-backed path:

from pyramids.feature import FeatureCollection

lfc = FeatureCollection.read_parquet(
    "s3://overturemaps-us-west-2/release/2024-07-22.0/theme=places/type=place",
    backend="dask",
    columns=["id", "names", "geometry"],
    bbox=(2.0, 48.8, 2.5, 49.0),
)
lfc.spatial_shuffle().sjoin(zones).compute()

The backend="dask" branch returns a LazyFeatureCollection (a subclass of dask_geopandas.GeoDataFrame) whose partition-aware ops (to_crs, clip, sjoin, spatial_shuffle) run lazily.

See Lazy vector reads for the full guide: spatial_shufflesjoin pruning workflow, compute vs persist, to_parquet, compute_total_bounds, and how to wire a distributed scheduler with pyramids.configure_lazy_vector.

Install: pip install 'pyramids-gis[parquet-lazy]'.

Build a one-row FC from a bbox — from_bbox#

FeatureCollection.from_bbox((W, S, E, N), epsg=…) is the shared primitive behind Dataset.crop(bbox=…), Dataset.read_array(bbox=…), and DatasetCollection.crop(bbox=…). It returns a single-row FC whose only geometry is the rectangular polygon — convenient when you want to hand the same mask to multiple downstream operations, or when you need the polygon for some other geopandas / shapely call.

from pyramids.feature import FeatureCollection

mask = FeatureCollection.from_bbox((6.8, 50.3, 7.2, 50.6), epsg=4326)
mask.to_file("aoi.geojson")

epsg is required (a bbox without a CRS is ambiguous); the bbox must satisfy west < east and south < north.

FeatureCollection Class#

pyramids.feature.FeatureCollection #

Bases: GeoDataFrame

A :class:geopandas.GeoDataFrame with pyramids-specific GIS methods.

FeatureCollection is a GeoDataFrameisinstance(fc, GeoDataFrame)` is `True — so every geopandas method is available directly. Pyramids adds rasterization, Dataset interop, vertex extraction, and CRS helpers on top.

The OGR/GDAL backend is internal only; see :mod:pyramids.feature._ogr.

Source code in src/pyramids/feature/collection.py
 147
 148
 149
 150
 151
 152
 153
 154
 155
 156
 157
 158
 159
 160
 161
 162
 163
 164
 165
 166
 167
 168
 169
 170
 171
 172
 173
 174
 175
 176
 177
 178
 179
 180
 181
 182
 183
 184
 185
 186
 187
 188
 189
 190
 191
 192
 193
 194
 195
 196
 197
 198
 199
 200
 201
 202
 203
 204
 205
 206
 207
 208
 209
 210
 211
 212
 213
 214
 215
 216
 217
 218
 219
 220
 221
 222
 223
 224
 225
 226
 227
 228
 229
 230
 231
 232
 233
 234
 235
 236
 237
 238
 239
 240
 241
 242
 243
 244
 245
 246
 247
 248
 249
 250
 251
 252
 253
 254
 255
 256
 257
 258
 259
 260
 261
 262
 263
 264
 265
 266
 267
 268
 269
 270
 271
 272
 273
 274
 275
 276
 277
 278
 279
 280
 281
 282
 283
 284
 285
 286
 287
 288
 289
 290
 291
 292
 293
 294
 295
 296
 297
 298
 299
 300
 301
 302
 303
 304
 305
 306
 307
 308
 309
 310
 311
 312
 313
 314
 315
 316
 317
 318
 319
 320
 321
 322
 323
 324
 325
 326
 327
 328
 329
 330
 331
 332
 333
 334
 335
 336
 337
 338
 339
 340
 341
 342
 343
 344
 345
 346
 347
 348
 349
 350
 351
 352
 353
 354
 355
 356
 357
 358
 359
 360
 361
 362
 363
 364
 365
 366
 367
 368
 369
 370
 371
 372
 373
 374
 375
 376
 377
 378
 379
 380
 381
 382
 383
 384
 385
 386
 387
 388
 389
 390
 391
 392
 393
 394
 395
 396
 397
 398
 399
 400
 401
 402
 403
 404
 405
 406
 407
 408
 409
 410
 411
 412
 413
 414
 415
 416
 417
 418
 419
 420
 421
 422
 423
 424
 425
 426
 427
 428
 429
 430
 431
 432
 433
 434
 435
 436
 437
 438
 439
 440
 441
 442
 443
 444
 445
 446
 447
 448
 449
 450
 451
 452
 453
 454
 455
 456
 457
 458
 459
 460
 461
 462
 463
 464
 465
 466
 467
 468
 469
 470
 471
 472
 473
 474
 475
 476
 477
 478
 479
 480
 481
 482
 483
 484
 485
 486
 487
 488
 489
 490
 491
 492
 493
 494
 495
 496
 497
 498
 499
 500
 501
 502
 503
 504
 505
 506
 507
 508
 509
 510
 511
 512
 513
 514
 515
 516
 517
 518
 519
 520
 521
 522
 523
 524
 525
 526
 527
 528
 529
 530
 531
 532
 533
 534
 535
 536
 537
 538
 539
 540
 541
 542
 543
 544
 545
 546
 547
 548
 549
 550
 551
 552
 553
 554
 555
 556
 557
 558
 559
 560
 561
 562
 563
 564
 565
 566
 567
 568
 569
 570
 571
 572
 573
 574
 575
 576
 577
 578
 579
 580
 581
 582
 583
 584
 585
 586
 587
 588
 589
 590
 591
 592
 593
 594
 595
 596
 597
 598
 599
 600
 601
 602
 603
 604
 605
 606
 607
 608
 609
 610
 611
 612
 613
 614
 615
 616
 617
 618
 619
 620
 621
 622
 623
 624
 625
 626
 627
 628
 629
 630
 631
 632
 633
 634
 635
 636
 637
 638
 639
 640
 641
 642
 643
 644
 645
 646
 647
 648
 649
 650
 651
 652
 653
 654
 655
 656
 657
 658
 659
 660
 661
 662
 663
 664
 665
 666
 667
 668
 669
 670
 671
 672
 673
 674
 675
 676
 677
 678
 679
 680
 681
 682
 683
 684
 685
 686
 687
 688
 689
 690
 691
 692
 693
 694
 695
 696
 697
 698
 699
 700
 701
 702
 703
 704
 705
 706
 707
 708
 709
 710
 711
 712
 713
 714
 715
 716
 717
 718
 719
 720
 721
 722
 723
 724
 725
 726
 727
 728
 729
 730
 731
 732
 733
 734
 735
 736
 737
 738
 739
 740
 741
 742
 743
 744
 745
 746
 747
 748
 749
 750
 751
 752
 753
 754
 755
 756
 757
 758
 759
 760
 761
 762
 763
 764
 765
 766
 767
 768
 769
 770
 771
 772
 773
 774
 775
 776
 777
 778
 779
 780
 781
 782
 783
 784
 785
 786
 787
 788
 789
 790
 791
 792
 793
 794
 795
 796
 797
 798
 799
 800
 801
 802
 803
 804
 805
 806
 807
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
2215
2216
2217
2218
2219
2220
2221
2222
2223
2224
2225
2226
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2239
2240
2241
2242
2243
2244
2245
2246
2247
2248
2249
2250
2251
2252
2253
2254
2255
2256
2257
2258
2259
class FeatureCollection(GeoDataFrame):
    """A :class:`geopandas.GeoDataFrame` with pyramids-specific GIS methods.

    `FeatureCollection` *is a* `GeoDataFrame` — ``isinstance(fc,
    GeoDataFrame)` is `True`` — so every geopandas method is
    available directly. Pyramids adds rasterization, Dataset interop,
    vertex extraction, and CRS helpers on top.

    The OGR/GDAL backend is internal only; see
    :mod:`pyramids.feature._ogr`.
    """

    @property
    def _constructor(self):
        """Return the type pandas uses when constructing new frames."""
        return FeatureCollection

    # merge with GeoDataFrame._metadata instead of replacing it.
    # The parent class lists `_geometry_column_name` (the name of the
    # active geometry column); overriding _metadata with just our own
    # entries drops that attribute on pickle / copy / concat, and the
    # restored object can no longer find its geometry column. Always
    # splat the parent's list first.
    # dedupe via `dict.fromkeys` so that if a future geopandas
    # release adds one of our own names to its own `_metadata` list,
    # the pyramids subclass does not carry a duplicate entry. Python
    # preserves insertion order in dicts since 3.7, so the parent's
    # ordering is preserved.
    _metadata: list[str] = list(
        dict.fromkeys(
            [
                *GeoDataFrame._metadata,
                "_epsg_cache_crs",
                "_epsg_cache_value",
            ]
        )
    )
    """Instance attributes pandas must preserve across copy/slice/pickle.

    Holds:

    * `GeoDataFrame._metadata` (currently `_geometry_column_name`)
      — required for pickle round-trips to remember which column is
      the active geometry column.
    * `_epsg_cache_crs` / `_epsg_cache_value` — the EPSG
      cache.

    The list is wrapped in `list(dict.fromkeys(...))` so that a
    future geopandas release adding one of our own names to its own
    `_metadata` list does not produce a duplicate entry. `dict`
    preserves insertion order since Python 3.7, so the parent's
    ordering is preserved.
    """

    def __init__(self, data: Any = None, *args: Any, **kwargs: Any) -> None:
        """Construct a FeatureCollection.

        Accepts anything :class:`geopandas.GeoDataFrame` accepts.
        Rejects `ogr.DataSource` / `gdal.Dataset` with a clear error
        .
        """
        if isinstance(data, (ogr.DataSource, gdal.Dataset)):
            raise TypeError(
                "FeatureCollection no longer accepts ogr.DataSource or "
                "gdal.Dataset objects. OGR is an internal implementation "
                "detail. Use FeatureCollection.read_file(path) to load a "
                "file, or pass a GeoDataFrame."
            )
        super().__init__(data, *args, **kwargs)

    def __enter__(self) -> FeatureCollection:
        """Enter a context-managed block. Returns `self`.

        Returns:
            FeatureCollection: `self` — the exact same instance, so
            `with... as fc:` binds `fc` to this collection.

        Examples:
            - Use as a context manager and access rows inside the block:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> gdf = gpd.GeoDataFrame(
                ...     {"id": [1, 2]},
                ...     geometry=[Point(0, 0), Point(1, 1)],
                ...     crs="EPSG:4326",
                ... )
                >>> with FeatureCollection(gdf) as fc:
                ...     n = len(fc)
                >>> n
                2

                ```
            - Exceptions raised inside the block still propagate:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
                ...     )
                ... )
                >>> try:
                ...     with fc:
                ...         raise RuntimeError("boom")
                ... except RuntimeError as err:
                ...     print(err)
                boom

                ```
        """
        return self

    def __exit__(self, exc_type, exc, tb) -> bool:
        """Exit the context-managed block. Calls :meth:`close`.

        Args:
            exc_type: Exception class if the block raised, else `None`.
            exc: Exception instance if the block raised, else `None`.
            tb: Traceback for the raised exception, else `None`.

        Returns:
            bool: Always `False` — exceptions from inside the `with`
            block propagate to the caller rather than being swallowed.

        Examples:
            - The clean-exit path returns `False` so nothing is swallowed:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
                ...     )
                ... )
                >>> fc.__exit__(None, None, None)
                False

                ```
            - A `with` block that finishes normally just releases the FC:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> gdf = gpd.GeoDataFrame(
                ...     {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
                ... )
                >>> with FeatureCollection(gdf) as fc:
                ...     pass
                >>> len(fc)
                1

                ```
        """
        self.close()
        return False

    def close(self) -> None:
        """Release resources held by this FeatureCollection.

        No-op today (the OGR bridge is self-cleaning). Exists so future
        resource-holding features have an idiomatic release point.

        Returns:
            None: This method does not return a value.

        Examples:
            - `close()` is idempotent — calling it repeatedly is safe:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
                ...     )
                ... )
                >>> fc.close()
                >>> fc.close()
                >>> len(fc)
                1

                ```
            - The collection remains usable after `close` (no-op today):
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"v": [7]}, geometry=[Point(2, 3)], crs="EPSG:4326",
                ...     )
                ... )
                >>> fc.close()
                >>> fc.epsg
                4326

                ```
        """
        return None

    @classmethod
    def from_features(
        cls,
        features: Iterable[Any],
        *,
        crs: Any = None,
        columns: list[str] | None = None,
    ) -> FeatureCollection:
        """Build a FeatureCollection from feature-shaped inputs.

        Delegates to :meth:`geopandas.GeoDataFrame.from_features` and
        wraps the result. Accepts any of the shapes that method
        accepts:

        * a list (or iterator) of GeoJSON feature dicts of the form
          `{"type": "Feature", "geometry": {...}, "properties": {...}}`,
        * any object exposing `__geo_interface__` (shapely
          geometries, fiona records, custom feature classes), or
        * a bare `FeatureCollection` dict (`{"type":
          "FeatureCollection", "features": [...]}`).

        Args:
            features (Iterable):
                Feature dicts of the form
                `{"type": "Feature", "geometry": {...}, "properties": {...}}`,
                or any `__geo_interface__` provider. Also accepts a
                bare `FeatureCollection` dict.
            crs:
                CRS to attach to the result (EPSG int, `"EPSG:4326"`,
                WKT, Proj, or a :class:`pyproj.CRS`). `None` leaves
                the CRS unset.
            columns (list[str] | None):
                Explicit column order for the output. When `None`,
                geopandas infers columns from the first feature.

        Returns:
            FeatureCollection: A new FC backed by the supplied features.

        Raises:
            ValueError: If `features` is empty or exhausted before any
                feature is consumed. An empty GeoDataFrame from
                `from_features` has no `geometry` column, which
                breaks downstream pyramids methods that assume the
                column exists. Fail fast instead.

        Examples:
            - Build from a list of feature dicts:
                ```python
                >>> from pyramids.feature import FeatureCollection
                >>> feats = [
                ...     {"type": "Feature",
                ...      "geometry": {"type": "Point", "coordinates": [0, 0]},
                ...      "properties": {"name": "a"}},
                ...     {"type": "Feature",
                ...      "geometry": {"type": "Point", "coordinates": [1, 1]},
                ...      "properties": {"name": "b"}},
                ... ]
                >>> fc = FeatureCollection.from_features(feats, crs=4326)
                >>> len(fc)
                2
                >>> fc.epsg
                4326

                ```
        """
        # materialise an iterator so we can detect the empty case
        # before handing off to geopandas. `geopandas.from_features([])`
        # returns a GeoDataFrame with no `geometry` column, which
        # breaks every pyramids op that assumes the column exists.
        features_list = list(features)
        if not features_list:
            raise ValueError(
                "from_features requires at least one feature. An empty "
                "iterable would produce a GeoDataFrame with no geometry "
                "column, which breaks downstream pyramids methods."
            )
        gdf = gpd.GeoDataFrame.from_features(features_list, crs=crs, columns=columns)
        return cls(gdf)

    @classmethod
    def from_bbox(
        cls,
        bbox: tuple[float, float, float, float] | list[float],
        *,
        epsg: Any,
    ) -> FeatureCollection:
        """Build a one-row FeatureCollection from a geographic bounding box.

        The bbox is the canonical ``(west, south, east, north)`` quadruple in
        the CRS named by ``epsg``. The result is a single-row FC whose only
        geometry is a rectangular Polygon — handy for cropping a raster or
        windowed-reading it without writing out the polygon vertices by hand:

        .. code-block:: python

            mask = FeatureCollection.from_bbox((31.0, 30.0, 31.1, 30.1), epsg=4326)
            cropped = dataset.crop(mask)

        Most callers do not need to build this themselves — :meth:`Dataset.crop`
        and :meth:`Dataset.read_array` (via :meth:`pyramids.dataset.engines.io.IO.read_array`)
        accept the bbox/``epsg`` pair directly and call this helper internally.

        Args:
            bbox: A 4-element ``(west, south, east, north)`` tuple / list of
                numbers. Must satisfy ``west < east`` and ``south < north``.
            epsg: CRS for the bbox coordinates — anything ``geopandas`` accepts
                for ``crs=`` (EPSG int such as ``4326``, ``"EPSG:4326"`` string,
                WKT, Proj, or a :class:`pyproj.CRS`). Required (a bbox without
                a CRS is ambiguous).

        Returns:
            FeatureCollection: A one-row FC carrying the rectangular polygon,
            in the supplied CRS.

        Raises:
            ValueError: ``bbox`` is not a 4-element sequence, or violates
                ``west < east`` / ``south < north``, or ``epsg`` is ``None``.
            TypeError: ``bbox`` elements are not numbers.

        Examples:
            - Build a one-row FC from a bbox and inspect it:
                ```python
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection.from_bbox((31.0, 30.0, 31.1, 30.1), epsg=4326)
                >>> len(fc)
                1
                >>> tuple(float(v) for v in fc.total_bounds)
                (31.0, 30.0, 31.1, 30.1)
                >>> fc.crs.to_epsg()
                4326

                ```
            - Use it as a mask to crop a raster:
                ```python
                >>> import numpy as np
                >>> from pyramids.dataset import Dataset
                >>> from pyramids.feature import FeatureCollection
                >>> arr = np.arange(100, dtype="int16").reshape(10, 10)
                >>> ds = Dataset.create_from_array(
                ...     arr, top_left_corner=(0, 0), cell_size=0.05, epsg=4326,
                ... )
                >>> fc = FeatureCollection.from_bbox((0.1, -0.2, 0.2, -0.1), epsg=4326)
                >>> ds.crop(mask=fc).shape
                (1, 2, 2)

                ```
            - ``epsg=None`` is rejected — a bbox without a CRS is ambiguous:
                ```python
                >>> from pyramids.feature import FeatureCollection
                >>> try:
                ...     FeatureCollection.from_bbox((0, 0, 1, 1), epsg=None)
                ... except ValueError as exc:
                ...     print("epsg" in str(exc))
                True

                ```

        See Also:
            - :meth:`pyramids.dataset.engines.spatial.Spatial.crop`: accepts
              ``bbox=`` / ``epsg=`` directly and routes through this helper.
            - :meth:`pyramids.dataset.engines.io.IO.read_array`: same.
        """
        if epsg is None:
            raise ValueError(
                "from_bbox requires an explicit epsg= for the bbox CRS; "
                "a bbox without a CRS is ambiguous"
            )
        try:
            seq = list(bbox)
        except TypeError as exc:
            raise ValueError(
                f"bbox must be a 4-element (west, south, east, north) sequence; "
                f"got {bbox!r}"
            ) from exc
        if len(seq) != 4:
            raise ValueError(
                f"bbox must have exactly 4 elements (west, south, east, north); "
                f"got {len(seq)}: {seq!r}"
            )
        try:
            w, s, e, n = (float(v) for v in seq)
        except (TypeError, ValueError) as exc:
            raise TypeError(f"bbox elements must be numbers; got {seq!r}") from exc
        if not (w < e):
            raise ValueError(f"bbox must satisfy west < east; got west={w}, east={e}")
        if not (s < n):
            raise ValueError(
                f"bbox must satisfy south < north; got south={s}, north={n}"
            )
        return cls(geometry=[box(w, s, e, n)], crs=epsg)

    @classmethod
    def from_records(
        cls,
        records: Any,
        *,
        geometry: str = "geometry",
        crs: Any = None,
        orient: str = "records",
    ) -> FeatureCollection:
        """Build a FeatureCollection from dict records.

        Two input orientations are accepted (C26 added the second):

        * `orient="records"` (default) — an iterable of per-row dicts,
          each of the form `{column: value,..., geometry: <shapely>}`.
          The dict's keys become column names; the key named by
          `geometry` must hold a shapely geometry.
        * `orient="list"` — a single columnar dict mapping each
          column name to a list of values of equal length, for
          example `{"id": [1, 2], "geometry": [pt_a, pt_b]}`.

        Useful for ingesting rows from an API response that doesn't
        emit GeoJSON but already has shapely geoms.

        Args:
            records:
                Per-row iterable of dicts when `orient="records"`, or a
                single columnar dict when `orient="list"`.
            geometry (str):
                Name of the column / key holding the shapely geometry.
                Default `"geometry"`.
            crs:
                CRS to attach (same forms as :meth:`from_features`).
            orient (str):
                `"records"` (default) or `"list"` — matches the
                pandas `from_dict`/`from_records` conventions.

        Returns:
            FeatureCollection: A new FC with one row per record.

        Raises:
            FeatureError: If a record is missing the `geometry`
                column.
            ValueError: If `orient` is not one of the supported
                values.

        Examples:
            - Per-row records with the default geometry key:
                ```python
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> recs = [
                ...     {"id": 1, "geometry": Point(0, 0)},
                ...     {"id": 2, "geometry": Point(1, 1)},
                ... ]
                >>> fc = FeatureCollection.from_records(recs, crs=4326)
                >>> len(fc)
                2
                >>> fc.epsg
                4326

                ```
            - Custom geometry key via the `geometry=` kwarg:
                ```python
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> recs = [
                ...     {"id": 1, "geom": Point(0, 0)},
                ...     {"id": 2, "geom": Point(1, 1)},
                ... ]
                >>> fc = FeatureCollection.from_records(
                ...     recs, geometry="geom", crs=4326,
                ... )
                >>> fc.geometry.name
                'geom'

                ```
            - Columnar dict via `orient="list"`:
                ```python
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> cols = {"id": [1, 2], "geometry": [Point(0, 0), Point(1, 1)]}
                >>> fc = FeatureCollection.from_records(
                ...     cols, orient="list", crs=4326,
                ... )
                >>> list(fc["id"])
                [1, 2]

                ```
        """

        # empty-input branches both build a single-column frame
        # whose column name matches the `geometry=` kwarg, so
        # `GeoDataFrame(..., geometry=…)` sets it as the active
        # geometry column and the returned FC has
        # `geometry.name == geometry`.
        def _empty_fc() -> FeatureCollection:
            return cls(gpd.GeoDataFrame({geometry: []}, geometry=geometry, crs=crs))

        if orient == "records":
            records_list = list(records)
            if not records_list:
                return _empty_fc()
            df = pd.DataFrame.from_records(records_list)
        elif orient == "list":
            # columnar dict of equal-length lists. Straight into
            # `pd.DataFrame` which accepts this shape natively and
            # raises `ValueError` on mismatched lengths (propagated
            # to the caller as-is — the pandas message is already clear).
            if not isinstance(records, dict):
                raise ValueError(
                    f"orient='list' expects a dict of column → list; "
                    f"got {type(records).__name__}."
                )
            df = pd.DataFrame(records)
            if len(df) == 0:
                return _empty_fc()
        else:
            raise ValueError(f"orient must be 'records' or 'list'; got {orient!r}.")
        if geometry not in df.columns:
            raise FeatureError(
                f"records missing required geometry column {geometry!r}; "
                f"columns present: {list(df.columns)}"
            )
        return cls(gpd.GeoDataFrame(df, geometry=geometry, crs=crs))

    _VALID_TILE_STRATEGIES: tuple[str, ...] = (
        "auto",
        "rtree",
        "row_group",
        "none",
    )

    @classmethod
    def iter_features(
        cls,
        path: str | Path,
        *,
        layer: str | int | None = None,
        bbox: tuple[float, float, float, float] | None = None,
        where: str | None = None,
        chunksize: int | None = None,
        tile_strategy: str = "auto",
        include_index: bool = False,
    ) -> Any:
        """Stream features from `path` without materializing the full file.

        . Two orthogonal knobs:

        * **Chunk shape**. `chunksize=None` yields one GeoJSON-style
          dict per row (fiona idiom). `chunksize=N` yields
          :class:`FeatureCollection` batches of up to N rows each so
          batched pipelines get a DataFrame-shaped payload.
        * **Tile strategy**. Controls whether the `bbox`
          filter is pushed into the format's spatial index (rtree on
          GPKG, row-group statistics on Parquet, …) or applied after
          a full scan. Pass one of:

          - `"auto"` (default) — let pyogrio pick. For a GPKG,
            pyogrio queries the `rtree_<layer>_geom` companion
            table automatically. For a Parquet file, pyogrio /
            pyarrow push the bbox down to the row-group statistics
            and skip non-matching groups. For formats without a
            spatial index (GeoJSON, Shapefile without a `.qix`)
            this falls back to a full scan in the driver.
          - `"rtree"` — same as `"auto"`; kept as an explicit
            name so pipeline code can document intent.
          - `"row_group"` — same as `"auto"`; explicit name for
            the Parquet case.
          - `"none"` — disable index pushdown; read whole chunks
            from the driver and apply the bbox filter in Python.
            Useful when the on-disk spatial index is stale or
            suspected wrong; also exercises the "slow path" in
            tests.

        `bbox` / `where` compose with any tile_strategy. Paths run
        through :func:`pyramids._io._parse_path` so cloud URLs and
        archive paths work the same way as in :meth:`read_file`.

        Args:
            path (str | Path): File path, URL, archive path.
            layer (str | int | None): Layer selector for multi-layer
                formats.
            bbox: `(minx, miny, maxx, maxy)` filter.
            where (str | None): OGR SQL predicate.
            chunksize (int | None): `None` yields dicts, an `int`
                yields `FeatureCollection` chunks.
            tile_strategy (str): One of `"auto"`, `"rtree"`,
                `"row_group"`, `"none"`. Default `"auto"`.
            include_index (bool): When `True`, each yielded dict gets
                an additional `"id"` key whose value is the
                0-based file-row index of that feature. The chunked
                form (`chunksize=N`) attaches the same index as a
                `"_row_index"` column on the yielded FC. The indices
                stay aligned with the on-disk rows even when a
                Python-side bbox filter (`tile_strategy="none"`)
                drops some rows — only the surviving features are
                yielded, and their ids match the positions they had
                in the source file. Defaults to `False` for
                back-compat with the fiona idiom.

        Yields:
            dict | FeatureCollection: Per-feature dicts when
            `chunksize` is `None`; FeatureCollection chunks
            otherwise.

        Raises:
            ValueError: If `chunksize` is given but `< 1`, or if
                `tile_strategy` is not one of the accepted values.

        Examples:
            - Stream features one at a time as GeoJSON-style dicts:
                ```python
                >>> import tempfile
                >>> from pathlib import Path
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> d = Path(tempfile.mkdtemp())
                >>> path = d / "pts.geojson"
                >>> gdf = gpd.GeoDataFrame(
                ...     {"id": [1, 2, 3]},
                ...     geometry=[Point(0, 0), Point(1, 1), Point(2, 2)],
                ...     crs="EPSG:4326",
                ... )
                >>> gdf.to_file(path, driver="GeoJSON")
                >>> feats = list(FeatureCollection.iter_features(path))
                >>> len(feats)
                3
                >>> feats[0]["properties"]["id"]
                1

                ```
            - Stream in `chunksize=2` batches as FeatureCollection chunks:
                ```python
                >>> import tempfile
                >>> from pathlib import Path
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> d = Path(tempfile.mkdtemp())
                >>> path = d / "pts.geojson"
                >>> gdf = gpd.GeoDataFrame(
                ...     {"id": [1, 2, 3]},
                ...     geometry=[Point(0, 0), Point(1, 1), Point(2, 2)],
                ...     crs="EPSG:4326",
                ... )
                >>> gdf.to_file(path, driver="GeoJSON")
                >>> chunks = list(
                ...     FeatureCollection.iter_features(path, chunksize=2)
                ... )
                >>> [len(c) for c in chunks]
                [2, 1]

                ```
            - Invalid `chunksize` raises `ValueError`:
                ```python
                >>> from pyramids.feature import FeatureCollection
                >>> gen = FeatureCollection.iter_features("anywhere", chunksize=0)
                >>> next(gen)
                Traceback (most recent call last):
                    ...
                ValueError: chunksize must be >= 1 when supplied; got 0.

                ```
        """
        if chunksize is not None and chunksize < 1:
            raise ValueError(f"chunksize must be >= 1 when supplied; got {chunksize}.")
        if tile_strategy not in cls._VALID_TILE_STRATEGIES:
            raise ValueError(
                f"tile_strategy must be one of "
                f"{cls._VALID_TILE_STRATEGIES}; got {tile_strategy!r}."
            )

        import pyogrio

        resolved = str(_pyramids_io._parse_path(path))

        # Determine how many features are in the layer so we can
        # iterate in fixed-size batches via skip_features / max_features.
        # pyogrio's read_info is O(1) per call.
        info_kwargs: dict[str, Any] = {}
        if layer is not None:
            info_kwargs["layer"] = layer
        info = pyogrio.read_info(resolved, **info_kwargs)
        total = int(info["features"])

        if chunksize is None:
            batch_size = _DEFAULT_ITER_BATCH_SIZE
        else:
            batch_size = int(chunksize)

        # D-M3: pin the engine to pyogrio. `skip_features` /
        # `max_features` are pyogrio-specific (geopandas' fiona
        # engine silently ignores them, which would turn every chunk
        # into a full scan). Pinning the engine makes the contract
        # explicit and fails fast if pyogrio is absent.
        read_kwargs: dict[str, Any] = {"engine": "pyogrio"}
        if layer is not None:
            read_kwargs["layer"] = layer
        if where is not None:
            read_kwargs["where"] = where

        # when tile_strategy is "auto"/"rtree"/"row_group",
        # forward the bbox to pyogrio which transparently uses the
        # format's spatial index. When "none", hold the bbox back
        # and apply it in Python after each chunk loads.
        pushdown_bbox = bbox if tile_strategy != "none" else None
        python_bbox = bbox if tile_strategy == "none" else None
        if pushdown_bbox is not None:
            read_kwargs["bbox"] = pushdown_bbox

        for start in range(0, total, batch_size):
            gdf_chunk = gpd.read_file(
                resolved,
                skip_features=start,
                max_features=batch_size,
                **read_kwargs,
            )
            # remember the absolute row indices before any
            # bbox-based masking so callers can map yielded features
            # back to their source rows even after a Python-side filter
            # has dropped some of them.
            if include_index:
                row_indices = list(range(start, start + len(gdf_chunk)))
            if python_bbox is not None and len(gdf_chunk) > 0:
                xmin, ymin, xmax, ymax = python_bbox
                mask = gdf_chunk.intersects(box(xmin, ymin, xmax, ymax))
                if include_index:
                    row_indices = [ri for ri, keep in zip(row_indices, mask) if keep]
                gdf_chunk = gdf_chunk[mask]
            if chunksize is None:
                iterator = gdf_chunk.iterfeatures(na="null")
                if include_index:
                    for ri, feat in zip(row_indices, iterator):
                        feat["id"] = ri
                        yield feat
                else:
                    for feat in iterator:
                        yield feat
            else:
                chunk_fc = cls(gdf_chunk)
                if include_index:
                    chunk_fc["_row_index"] = row_indices
                yield chunk_fc

    @classmethod
    def read_file(
        cls,
        path: str | Path,
        *,
        layer: str | int | None = None,
        bbox: tuple[float, float, float, float] | Any = None,
        mask: Any = None,
        rows: slice | int | None = None,
        columns: list[str] | None = None,
        where: str | None = None,
        backend: str = "pandas",
        npartitions: int | None = None,
        chunksize: int | None = None,
        **kwargs: Any,
    ) -> FeatureCollection | LazyFeatureCollection:
        """Read a vector file into a FeatureCollection.

        path is first routed through
        :func:`pyramids._io._parse_path`, which handles:

        * Cloud-URL rewriting (`s3://`, `gs://`, `az://`,
          `abfs://`, `http(s)://`, `file://` → GDAL `/vsi*/`
          form). verified end-to-end through an HTTP test.
          For AWS / GCS / Azure credentials either set the standard
          environment variables (`AWS_ACCESS_KEY_ID`,
          `AWS_SECRET_ACCESS_KEY`, `GOOGLE_APPLICATION_CREDENTIALS`,
          `AZURE_STORAGE_CONNECTION_STRING`, …) or scope them via
          :class:`pyramids.base.remote.CloudConfig` as a context
          manager around the `read_file` call.
        * Compressed-archive dispatch for `.zip`, `.tar`, `.tar.gz`,
          `.gz` on **local** paths — the returned path is a
          `/vsizip/`, `/vsitar/` or `/vsigzip/` string that
          :func:`geopandas.read_file` (via GDAL's virtual filesystem)
          can open directly. You can either pass just the archive
          path (first contained file wins) or
          `archive.zip/inner.geojson` to target a specific member.
          Cloud + archive chaining (`http://host/x.zip`) is not
          automatic today — if you need it, stage the archive
          locally first or use `CloudConfig` with an explicit
          `/vsizip//vsicurl/...` path.

        filter kwargs are pushed down to fiona/pyogrio so the
        dataset never fully materializes when only a subset is needed.

        Args:
            path (str | Path):
                File path, URL, archive path, or
                `archive.ext/inner-file` form.
            layer (str | int | None):
                Layer name or index for multi-layer formats
                (GeoPackage, GDB, KML, …). `None` reads the first /
                default layer.
            bbox:
                `(minx, miny, maxx, maxy)` tuple, or a
                `GeoDataFrame` / `GeoSeries` / shapely geometry
                whose total bounds are used. Only features
                intersecting the bbox are loaded.
            mask:
                A shapely geometry (or mapping / GeoSeries /
                GeoDataFrame) whose geometries are used as a mask —
                only features intersecting the mask are loaded. Finer
                than `bbox` (actual geometry intersection, not just
                envelope). Mutually exclusive with `bbox`.
            rows (slice | int | None):
                `int` — read at most N rows. `slice` — read the
                given range of rows. Useful for sampling.
            columns (list[str] | None):
                Restrict loaded attribute columns. Geometry is
                always loaded. `None` loads every column.
            where (str | None):
                OGR SQL `WHERE`-clause predicate pushed down to the
                driver (e.g. `"population > 10000"`). Avoids loading
                non-matching features.
            **kwargs:
                Forwarded to :func:`geopandas.read_file` verbatim for
                engine-specific options (`engine="pyogrio"`,
                `use_arrow=True`, driver-specific creation options).

        Returns:
            FeatureCollection: The (possibly filtered) features
            wrapped as a FeatureCollection.

        Examples:
            - Load a GeoJSON file:
                ```python
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection.read_file("tests/data/coello-gauges.geojson")
                >>> len(fc) > 0
                True

                ```
        """
        resolved = _pyramids_io._parse_path(path)
        if backend == "dask":
            # dask_geopandas.read_file does NOT forward pyogrio
            # filter kwargs (bbox / mask / rows / columns / where) —
            # silently dropping them was the bug. Raise a clear
            # ValueError instead so users know to either pre-filter
            # or call .compute() and filter eagerly.
            unsupported = {
                "bbox": bbox,
                "mask": mask,
                "rows": rows,
                "columns": columns,
                "where": where,
                "layer": layer,
            }
            supplied = [k for k, v in unsupported.items() if v is not None]
            if supplied:
                raise ValueError(
                    f"backend='dask' does not support filter kwargs "
                    f"{supplied}. dask_geopandas.read_file has no "
                    "pushdown story for these. Either omit them and "
                    "filter post-load via .clip / .loc / .compute, or "
                    "switch to read_parquet(backend='dask', filters=...)"
                )
            try:
                import dask_geopandas
            except ImportError as exc:
                raise ImportError(
                    "backend='dask' requires the optional "
                    "'dask-geopandas' dependency. Install with one of:\n"
                    "  - PyPI:        pip install 'pyramids-gis[parquet-lazy]'\n"
                    "  - conda-forge: conda install -c conda-forge pyramids-parquet-lazy"
                ) from exc
            # default npartitions from file size when neither
            # kwarg was supplied; one-partition fallback defeats the
            # point of going lazy.
            partition_kwargs = _resolve_lazy_partitioning(
                resolved,
                npartitions,
                chunksize,
            )
            # wrap the lazy return as a LazyFeatureCollection so the
            # dask branch stays inside the pyramids type system.
            from pyramids.feature._lazy_collection import LazyFeatureCollection

            dask_gdf = dask_geopandas.read_file(resolved, **partition_kwargs)
            return LazyFeatureCollection.from_dask_gdf(dask_gdf)
        if backend != "pandas":
            raise ValueError(f"backend must be 'pandas' or 'dask', got {backend!r}")
        # Only pass kwargs that were actually supplied — passing the
        # defaults (None) is fine for some geopandas engines but
        # confuses others. Build a clean kwargs dict.
        passthrough: dict[str, Any] = {}
        if layer is not None:
            passthrough["layer"] = layer
        if bbox is not None:
            passthrough["bbox"] = bbox
        if mask is not None:
            passthrough["mask"] = mask
        if rows is not None:
            passthrough["rows"] = rows
        if columns is not None:
            passthrough["columns"] = columns
        if where is not None:
            passthrough["where"] = where
        passthrough.update(kwargs)
        gdf = gpd.read_file(resolved, **passthrough)
        return cls(gdf)

    @property
    def epsg(self) -> int | None:
        """EPSG code of this FeatureCollection's CRS (cached).

        The value is cached per CRS-object identity so repeated access
        on hot paths skips the `pyproj.CRS.to_epsg` call. The cache
        auto-invalidates whenever `self.crs` is replaced.

        identity-miss falls back to equality. If `self.crs` has
        been reassigned to a different CRS object that nevertheless
        compares equal to the cached one (e.g. `fc.crs = pyproj.CRS(
        "EPSG:4326")` on a frame already in EPSG:4326), we adopt the
        new object as the cache key and skip the `.to_epsg()` call.
        Only when the value really differs do we recompute.

        the equality fallback is cheaper than a fresh
        `.to_epsg()` (which re-parses the CRS) but it is not free —
        `pyproj.CRS.__eq__` does a WKT2 string comparison. If a
        future pandas/geopandas release stops returning the same
        `self.crs` object identity across accesses, the fallback
        runs on every `fc.epsg` and adds up on hot loops. Switch
        the cache key to `self.crs.to_wkt()` if a profile ever
        shows this dominating.

        Returns:
            int | None: The integer EPSG code if the CRS is registered
            in the EPSG authority; `None` when the FC has no CRS set
            or when its CRS cannot be mapped to a single EPSG code.

        Examples:
            - Frame built with WGS84 reports EPSG 4326:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
                ...     )
                ... )
                >>> fc.epsg
                4326

                ```
            - A frame without a CRS returns `None`:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame({"id": [1]}, geometry=[Point(0, 0)])
                ... )
                >>> fc.epsg is None
                True

                ```
            - Reprojecting to Web Mercator updates the cached code:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
                ...     )
                ... )
                >>> fc = fc.to_crs(3857)
                >>> fc.epsg
                3857

                ```
        """
        crs = self.crs
        cached_crs = getattr(self, "_epsg_cache_crs", None)
        if cached_crs is crs:
            return getattr(self, "_epsg_cache_value", None)
        # try equality before falling back to a fresh to_epsg() call.
        # pyproj.CRS comparison is cheaper than a full re-parse, and the
        # common "reassign an equivalent CRS" case (e.g. set_crs chain)
        # should stay in the fast path.
        if cached_crs is not None and crs is not None:
            try:
                equivalent = cached_crs == crs
            except (TypeError, ValueError):
                equivalent = False
            if equivalent:
                object.__setattr__(self, "_epsg_cache_crs", crs)
                return getattr(self, "_epsg_cache_value", None)
        if crs is None:
            value: int | None = None
        else:
            code = crs.to_epsg()
            value = int(code) if code is not None else None
        object.__setattr__(self, "_epsg_cache_crs", crs)
        object.__setattr__(self, "_epsg_cache_value", value)
        return value

    @property
    def top_left_corner(self) -> list[Number]:
        """Top-left corner `[xmin, ymax]` of the total bounds.

        Returns:
            list[Number]: Two-element list `[xmin, ymax]` — the
            minimum x-coordinate paired with the maximum y-coordinate
            of the union of all geometry bounds.

        Examples:
            - Two points span a unit square — the top-left is `[0, 1]`:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1, 2]},
                ...         geometry=[Point(0, 0), Point(1, 1)],
                ...         crs="EPSG:4326",
                ...     )
                ... )
                >>> fc.top_left_corner
                [0.0, 1.0]

                ```
            - Offset points yield the offset top-left corner:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1, 2]},
                ...         geometry=[Point(10, 20), Point(15, 30)],
                ...         crs="EPSG:4326",
                ...     )
                ... )
                >>> fc.top_left_corner
                [10.0, 30.0]

                ```
        """
        bounds = self.total_bounds.tolist()
        return [bounds[0], bounds[3]]

    @property
    def column(self) -> list[str]:
        """Deprecated alias for :attr:`columns` returning a `list[str]`.

        Returns:
            list[str]: Column names in their current order, including
            the active geometry column.

        Examples:
            - A frame with an `id` field reports both columns:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
                ...     )
                ... )
                >>> fc.column
                ['id', 'geometry']

                ```
            - Multiple attribute columns appear in insertion order:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"name": ["a"], "pop": [100]},
                ...         geometry=[Point(0, 0)],
                ...         crs="EPSG:4326",
                ...     )
                ... )
                >>> fc.column
                ['name', 'pop', 'geometry']

                ```
        """
        return self.columns.tolist()

    def __str__(self) -> str:
        """Return a short, pyramids-branded summary of the collection."""
        n = len(self)
        cols = self.columns.tolist()
        epsg = self.epsg
        return f"FeatureCollection({n} features, " f"columns={cols}, epsg={epsg})"

    def __repr__(self) -> str:
        """Return a pyramids-branded repr."""
        return (
            f"FeatureCollection(n_features={len(self)}, "
            f"columns={self.columns.tolist()}, epsg={self.epsg})"
        )

    @property
    def schema(self) -> dict:
        """Fiona-style schema: geometry type + field-type dict.

        Returns a dict shaped like fiona's `schema` attribute so
        callers migrating from `fiona.open(path).schema` can consume
        this without rewriting. The dict has three keys:

        * `"geometry"`: single string (`"Point"`, `"Polygon"`,
          …) when every row has the same geom type, otherwise
          `"Unknown"`.
        * `"properties"`: `{column_name: dtype_string}` for every
          non-geometry column.
        * `"crs"`: the :attr:`crs` as a :class:`pyproj.CRS` object,
          or `None` when the FC has no CRS set. Matches
          fiona's convention — callers migrating from
          `fiona.open(path).schema['crs']` can consume it directly.

        Empty FeatureCollections (`len(self) == 0`) report
        `"Unknown"` for the geometry type.

        Returns:
            dict: Three-key dict with `"geometry"`, `"properties"`,
            and `"crs"`.

        Examples:
            - Homogeneous point collection reports `"Point"`:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1, 2]},
                ...         geometry=[Point(0, 0), Point(1, 1)],
                ...         crs="EPSG:4326",
                ...     )
                ... )
                >>> schema = fc.schema
                >>> schema["geometry"]
                'Point'
                >>> schema["properties"]
                {'id': 'int64'}
                >>> schema["crs"].to_epsg()
                4326

                ```
            - Mixed geometry types collapse to `"Unknown"`:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point, LineString
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1, 2]},
                ...         geometry=[Point(0, 0), LineString([(0, 0), (1, 1)])],
                ...         crs="EPSG:4326",
                ...     )
                ... )
                >>> fc.schema["geometry"]
                'Unknown'

                ```
            - Frames without a CRS return `crs=None`:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame({"id": [1]}, geometry=[Point(0, 0)])
                ... )
                >>> fc.schema["crs"] is None
                True

                ```
        """
        geom_types = {g.geom_type for g in self.geometry if g is not None}
        if len(geom_types) == 1:
            (geom_type,) = geom_types
        else:
            geom_type = "Unknown"
        properties = {
            col: str(dt) for col, dt in self.dtypes.items() if col != "geometry"
        }
        return {
            "geometry": geom_type,
            "properties": properties,
            "crs": self.crs,
        }

    @classmethod
    def list_layers(cls, path: str | Path) -> list[str]:
        """List every vector-layer name in `path`.

        Routes through :func:`pyramids._io._parse_path` so the same
        cloud-URL / archive rewriting that :meth:`read_file` uses
        applies here too. Uses :func:`pyogrio.list_layers` under the
        hood (geopandas' default engine).

        results are memoised behind a 128-entry LRU cache keyed on
        the resolved `str` path. Re-calling `list_layers` on the
        same cloud URL or local path in a loop now costs one hash
        lookup instead of one datasource open. Call
        :meth:`list_layers_cache_clear` to invalidate after an
        out-of-band write.

        Args:
            path (str | Path):
                File path, URL, or archive path. Single-layer formats
                like GeoJSON return one name; multi-layer formats
                (GPKG, GDB, KML) return every layer.

        Returns:
            list[str]: Layer names in the order the driver reports them.

        Raises:
            FileNotFoundError: If `path` is a local filesystem path
                that does not exist. Cloud URLs and `/vsi*` paths
                skip this check and defer to the underlying driver
                . Previously all failures surfaced as an opaque
                `VectorDriverError("Failed to open datasource")`.

        Examples:
            - A single-layer GeoJSON returns one name derived from the filename:
                ```python
                >>> import tempfile
                >>> from pathlib import Path
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> d = Path(tempfile.mkdtemp())
                >>> path = d / "pts.geojson"
                >>> gdf = gpd.GeoDataFrame(
                ...     {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
                ... )
                >>> gdf.to_file(path, driver="GeoJSON")
                >>> FeatureCollection.list_layers(path)
                ['pts']

                ```
            - A missing local path raises `FileNotFoundError`:
                ```python
                >>> from pyramids.feature import FeatureCollection
                >>> FeatureCollection.list_layers("does/not/exist.geojson")
                Traceback (most recent call last):
                    ...
                FileNotFoundError: list_layers: no file at 'does/not/exist.geojson'.

                ```
        """
        # pre-check local-path existence so the caller sees
        # a `FileNotFoundError` naming the path instead of a generic
        # driver-open failure. Defer to `base.remote.is_remote` as
        # the single source of truth for which schemes are remote —
        # the previous hardcoded prefix tuple would silently treat any
        # future scheme as local and raise a misleading error.
        path_str = str(path)
        if not is_remote(path_str):
            local = Path(path_str)
            if not local.exists():
                raise FileNotFoundError(f"list_layers: no file at {path_str!r}.")

        resolved = str(_pyramids_io._parse_path(path))
        return list(_list_layers_cached(resolved))

    @classmethod
    def list_layers_cache_clear(cls) -> None:
        """Clear the C15 LRU cache backing :meth:`list_layers`.

        Call this after writing a new layer to an existing multi-layer
        file (e.g. a GPKG) if you then want :meth:`list_layers` to see
        the new layer. Otherwise the 128-entry LRU cache is self-
        managing and callers do not need to touch it.

        Returns:
            None: This method does not return a value.

        Examples:
            - Clearing an empty cache is a safe no-op:
                ```python
                >>> from pyramids.feature import FeatureCollection
                >>> FeatureCollection.list_layers_cache_clear()
                >>> FeatureCollection.list_layers_cache_clear()

                ```
            - After an out-of-band write, clear the cache so the next
              `list_layers` call re-reads the updated file:
                ```python
                >>> import tempfile
                >>> from pathlib import Path
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> d = Path(tempfile.mkdtemp())
                >>> path = d / "pts.geojson"
                >>> gpd.GeoDataFrame(
                ...     {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
                ... ).to_file(path, driver="GeoJSON")
                >>> _ = FeatureCollection.list_layers(path)
                >>> FeatureCollection.list_layers_cache_clear()
                >>> FeatureCollection.list_layers(path)
                ['pts']

                ```
        """
        _list_layers_cached.cache_clear()

    @classmethod
    def open_arrow(
        cls,
        path: str | Path,
        *,
        layer: str | int | None = None,
        columns: list[str] | None = None,
        bbox: tuple[float, float, float, float] | None = None,
        where: str | None = None,
        batch_size: int | None = None,
    ) -> Any:
        """Open a vector file as a streaming :class:`pyarrow.RecordBatchReader`.

        Thin wrapper over :func:`pyogrio.raw.open_arrow` that surfaces
        the underlying Arrow RecordBatch iterator. Rows are yielded in
        batches, so callers can iterate through multi-GB datasets
        without materializing the whole table in memory — useful for
        building custom dask partitioners.

        Args:
            path: Vector file path (Shapefile, GPKG, FlatGeobuf,
                GeoJSON, GeoParquet,...). Routed through
                :func:`pyramids._io._parse_path` so cloud URLs work.
            layer: Layer name or index for multi-layer formats.
            columns: Attribute columns to load (`geometry` is
                always included).
            bbox: `(minx, miny, maxx, maxy)` filter.
            where: OGR SQL `WHERE` predicate pushed down to the
                driver.
            batch_size: Requested RecordBatch size in rows. `None`
                uses the driver default.

        Returns:
            pyarrow.RecordBatchReader: A streaming reader. Call
            `.read_all()` to materialise, or iterate for row-batch
            consumption.

        Raises:
            ImportError: If :mod:`pyogrio` is not installed.
        """
        try:
            from pyogrio.raw import open_arrow
        except ImportError as exc:
            raise ImportError(
                "open_arrow requires the optional 'pyogrio' dependency. "
                "Install with one of:\n"
                "  - PyPI:        pip install pyogrio\n"
                "  - conda-forge: conda install -c conda-forge pyogrio"
            ) from exc
        resolved = _pyramids_io._parse_path(path)
        kwargs: dict[str, Any] = {}
        if layer is not None:
            kwargs["layer"] = layer
        if columns is not None:
            kwargs["columns"] = columns
        if bbox is not None:
            kwargs["bbox"] = bbox
        if where is not None:
            kwargs["where"] = where
        if batch_size is not None:
            kwargs["batch_size"] = batch_size
        return open_arrow(resolved, **kwargs)

    @classmethod
    def read_parquet(
        cls,
        path: str | Path,
        *,
        columns: list[str] | None = None,
        bbox: tuple[float, float, float, float] | None = None,
        backend: str = "pandas",
        split_row_groups: bool | None = None,
        filters: list | None = None,
        blocksize: int | str | None = None,
        storage_options: dict | None = None,
        **kwargs: Any,
    ) -> FeatureCollection | LazyFeatureCollection:
        """Read a GeoParquet file into a FeatureCollection.

        GeoParquet is a cloud-native columnar vector format (OGC-
        adopted December 2024) — faster to scan than GeoJSON, smaller
        than Shapefile, and partitioned in a way that suits distributed
        compute. This method is a thin wrapper around
        :func:`geopandas.read_parquet`; the path is first routed
        through :func:`pyramids._io._parse_path` so cloud URLs
        (`s3://`, `gs://`, `http(s)://`, …) resolve the same way
        they do in :meth:`read_file`.

        Requires the optional :mod:`pyarrow` dependency. Install with one of:

        - PyPI: ``pip install 'pyramids-gis[parquet]'``
        - conda-forge: ``conda install -c conda-forge pyramids-parquet``

        Args:
            path (str | Path):
                Local path, cloud URL, or any form
                :func:`pyramids._io._parse_path` accepts.
            columns (list[str] | None):
                Project a subset of columns — Parquet's columnar
                layout makes this a true I/O win, unlike row-oriented
                formats. `geometry` is always loaded. `None`
                loads every column.
            bbox (tuple[float, float, float, float] | None):
                `(minx, miny, maxx, maxy)` spatial filter.
                Forwarded to :func:`geopandas.read_parquet` which uses
                the file's GeoParquet spatial-index metadata when
                present to skip non-matching row groups — a true I/O
                win on large files. `None` (default) loads every
                feature.
            **kwargs:
                Forwarded to :func:`geopandas.read_parquet`
                (`storage_options=` for fsspec, etc.).

        Returns:
            FeatureCollection: The file's features wrapped as a
            FeatureCollection.

        Raises:
            ImportError: If :mod:`pyarrow` is not installed, with a
                pyramids-branded message pointing at the
                `[parquet]` optional-dependency extra (D-M5).

        Examples:
            - Round-trip a small FC through GeoParquet (requires pyarrow):
                ```python
                >>> import tempfile  # doctest: +SKIP
                >>> from pathlib import Path  # doctest: +SKIP
                >>> import geopandas as gpd  # doctest: +SKIP
                >>> from shapely.geometry import Point  # doctest: +SKIP
                >>> from pyramids.feature import FeatureCollection  # doctest: +SKIP
                >>> d = Path(tempfile.mkdtemp())  # doctest: +SKIP
                >>> path = d / "pts.parquet"  # doctest: +SKIP
                >>> gpd.GeoDataFrame(
                ...     {"id": [1, 2]},
                ...     geometry=[Point(0, 0), Point(1, 1)],
                ...     crs="EPSG:4326",
                ... ).to_parquet(path)  # doctest: +SKIP
                >>> fc = FeatureCollection.read_parquet(path)  # doctest: +SKIP
                >>> len(fc)  # doctest: +SKIP
                2
                >>> fc.epsg  # doctest: +SKIP
                4326

                ```
            - Project a subset of columns to speed up I/O on wide files:
                ```python
                >>> fc = FeatureCollection.read_parquet(  # doctest: +SKIP
                ...     "s3://bucket/big.parquet",
                ...     columns=["id", "geometry"],
                ... )
                >>> fc.column  # doctest: +SKIP
                ['id', 'geometry']

                ```
            - A missing pyarrow dependency raises a branded `ImportError`:
                ```python
                >>> FeatureCollection.read_parquet("x.parquet")  # doctest: +SKIP
                Traceback (most recent call last):
                    ...
                ImportError: GeoParquet support requires the optional 'pyarrow'...

                ```
        """
        resolved = _pyramids_io._parse_path(path)
        if backend == "dask":
            # check deps in order of specificity — the backend
            # request is the more specific signal, so the
            # dask-geopandas hint beats the generic pyarrow one.
            # When both are missing, the dask-geopandas error names
            # the extra that installs both ([parquet-lazy]).
            try:
                import dask_geopandas
            except ImportError as exc:
                raise ImportError(
                    "backend='dask' requires the optional "
                    "'dask-geopandas' dependency. Install with one of:\n"
                    "  - PyPI:        pip install 'pyramids-gis[parquet-lazy]'\n"
                    "  - conda-forge: conda install -c conda-forge pyramids-parquet-lazy"
                ) from exc
            dask_kwargs: dict[str, Any] = {}
            if columns is not None:
                dask_kwargs["columns"] = columns
            if split_row_groups is not None:
                dask_kwargs["split_row_groups"] = split_row_groups
            if filters is not None:
                dask_kwargs["filters"] = filters
            if blocksize is not None:
                dask_kwargs["blocksize"] = blocksize
            if storage_options is not None:
                dask_kwargs["storage_options"] = storage_options
            dask_kwargs.update(kwargs)
            # dask_geopandas is installed → assert pyarrow too, so
            # the user gets the pyramids-branded hint (not the
            # upstream message dask_geopandas would emit when it tries
            # to read). `[parquet-lazy]` pulls both.
            _require_pyarrow()
            # wrap the lazy return as a LazyFeatureCollection so the
            # dask branch stays inside the pyramids type system.
            from pyramids.feature._lazy_collection import LazyFeatureCollection

            dask_gdf = dask_geopandas.read_parquet(resolved, **dask_kwargs)
            return LazyFeatureCollection.from_dask_gdf(dask_gdf)
        if backend != "pandas":
            raise ValueError(f"backend must be 'pandas' or 'dask', got {backend!r}")
        _require_pyarrow()
        # geopandas 1.x forwards **kwargs straight into
        # `pyarrow.parquet.read_table`, which has never accepted the
        # pandas-style `engine=` kwarg. `_require_pyarrow()` above
        # already hard-guarantees the pyarrow backend, so no injection
        # is needed here. If geopandas ever reintroduces a fastparquet
        # path it will be opt-in via a new kwarg, not a silent switch.
        passthrough: dict[str, Any] = {}
        passthrough.update(kwargs)
        if columns is not None:
            passthrough["columns"] = columns
        if bbox is not None:
            passthrough["bbox"] = bbox
        if storage_options is not None:
            passthrough["storage_options"] = storage_options
        gdf = gpd.read_parquet(resolved, **passthrough)
        return cls(gdf)

    def to_parquet(
        self,
        path: str | Path,
        *,
        compression: str = "snappy",
        index: bool | None = None,
        **kwargs: Any,
    ) -> None:
        """Write this FeatureCollection to GeoParquet.

        Thin wrapper around :meth:`geopandas.GeoDataFrame.to_parquet`
        that defaults :param:`compression` to `"snappy"` — the
        format-standard tradeoff between speed and size.

        Requires the optional :mod:`pyarrow` dependency. Install with one of:

        - PyPI: ``pip install 'pyramids-gis[parquet]'``
        - conda-forge: ``conda install -c conda-forge pyramids-parquet``

        Args:
            path (str | Path):
                Destination file path.
            compression (str):
                Parquet compression codec — `"snappy"` (default),
                `"gzip"`, `"brotli"`, `"lz4"`, `"zstd"`, or
                `"none"`. `"snappy"` is the GeoParquet-spec
                recommended default.
            index (bool | None):
                Whether to include the pandas index as a column.
                `None` (default) uses geopandas' default behavior:
                preserve a non-default index, drop the default
                `RangeIndex`.
            **kwargs:
                Forwarded to :meth:`geopandas.GeoDataFrame.to_parquet`.

        Raises:
            ImportError: If :mod:`pyarrow` is not installed, with a
                pyramids-branded message pointing at the
                `[parquet]` optional-dependency extra (D-M5).

        Examples:
            - Write a FeatureCollection with the default snappy codec:
                ```python
                >>> import tempfile  # doctest: +SKIP
                >>> from pathlib import Path  # doctest: +SKIP
                >>> import geopandas as gpd  # doctest: +SKIP
                >>> from shapely.geometry import Point  # doctest: +SKIP
                >>> from pyramids.feature import FeatureCollection  # doctest: +SKIP
                >>> d = Path(tempfile.mkdtemp())  # doctest: +SKIP
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1, 2]},
                ...         geometry=[Point(0, 0), Point(1, 1)],
                ...         crs="EPSG:4326",
                ...     )
                ... )  # doctest: +SKIP
                >>> path = d / "out.parquet"  # doctest: +SKIP
                >>> fc.to_parquet(path)  # doctest: +SKIP
                >>> path.exists()  # doctest: +SKIP
                True

                ```
            - Pick a different codec (e.g. zstd for better compression):
                ```python
                >>> import tempfile  # doctest: +SKIP
                >>> from pathlib import Path  # doctest: +SKIP
                >>> import geopandas as gpd  # doctest: +SKIP
                >>> from shapely.geometry import Point  # doctest: +SKIP
                >>> from pyramids.feature import FeatureCollection  # doctest: +SKIP
                >>> d = Path(tempfile.mkdtemp())  # doctest: +SKIP
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
                ...     )
                ... )  # doctest: +SKIP
                >>> fc.to_parquet(d / "out.parquet", compression="zstd")  # doctest: +SKIP

                ```
        """
        _require_pyarrow()
        super().to_parquet(path, compression=compression, index=index, **kwargs)

    def to_file(
        self,
        path: str | Path,
        driver: str = "geojson",
        *,
        layer: str | None = None,
        mode: str = "w",
        **creation_options: Any,
    ) -> None:
        """Write this FeatureCollection to a vector file.

        `layer`, `mode`, and arbitrary driver creation
        options are now first-class kwargs. Previously callers had to
        rely on implicit `**kwargs` forwarding, which hurt
        discoverability.

        Args:
            path (str | Path):
                Destination file path.
            driver (str):
                Driver alias (e.g. `"geojson"`, `"gpkg"`) or
                literal GDAL driver name (`"GeoJSON"`, `"GPKG"`,
                `"ESRI Shapefile"`). Resolved via :class:`Catalog`.
            layer (str | None):
                Layer name for multi-layer drivers (GPKG, GDB, …).
                Writing two layers into the same GPKG is the canonical
                use case. `None` defers to the driver default.
            mode (str):
                `"w"` (default) overwrites; `"a"` appends to an
                existing layer. Append support depends on the driver
                — GPKG and Shapefile accept it, GeoJSON does not.
            **creation_options:
                Driver-specific creation options, forwarded to the
                underlying engine (pyogrio / fiona). Examples:

                * GPKG: `SPATIAL_INDEX="YES"`, `FID="id"`.
                * Shapefile: `ENCODING="UTF-8"`.
                * GeoJSON: `COORDINATE_PRECISION=6`, `RFC7946=YES`.

                Keys are case-preserving and passed verbatim to the
                driver; consult the GDAL driver docs for the full
                list.

                pyogrio (the default geopandas engine on 1.0+)
                raises :class:`ValueError` with the message
                `"unrecognized option '<name>' for driver '<driver>'"`
                when a supplied option is neither in the driver's
                dataset nor its layer creation-option list. This
                surfaces typos (`SPATIAL_INDX` vs `SPATIAL_INDEX`)
                at write-time rather than silently producing a
                different file. Some drivers may still accept options
                that pyogrio does not list — verify against the
                driver's docs when in doubt.

        Raises:
            ValueError: If `mode` isn't `"w"` or `"a"`, or if a
                supplied creation option is not recognised by the
                driver (raised by pyogrio — see the `**creation_options`
                note above).

        Examples:
            - Round-trip a small FC through GeoJSON (the default driver):
                ```python
                >>> import tempfile
                >>> from pathlib import Path
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> d = Path(tempfile.mkdtemp())
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1, 2]},
                ...         geometry=[Point(0, 0), Point(1, 1)],
                ...         crs="EPSG:4326",
                ...     )
                ... )
                >>> path = d / "out.geojson"
                >>> fc.to_file(path)
                >>> path.exists()
                True
                >>> FeatureCollection.read_file(path).column
                ['id', 'geometry']

                ```
            - Write to GeoPackage with a named layer:
                ```python
                >>> import tempfile
                >>> from pathlib import Path
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> d = Path(tempfile.mkdtemp())
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
                ...     )
                ... )
                >>> path = d / "out.gpkg"
                >>> fc.to_file(path, driver="gpkg", layer="rivers")
                >>> FeatureCollection.list_layers(path)
                ['rivers']

                ```
            - Invalid `mode` raises `ValueError` before touching the file:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
                ...     )
                ... )
                >>> fc.to_file("ignored.geojson", mode="x")
                Traceback (most recent call last):
                    ...
                ValueError: mode must be 'w' (write) or 'a' (append); got 'x'.

                ```
        """
        if mode not in ("w", "a"):
            raise ValueError(f"mode must be 'w' (write) or 'a' (append); got {mode!r}.")
        try:
            resolved = CATALOG.get_gdal_name(driver) or driver
        except AttributeError:
            resolved = driver

        # pin the engine to pyogrio to match :meth:`read_file` and
        # :meth:`iter_features`. Callers who want fiona for some reason
        # can override via `engine="fiona"` in creation_options, but
        # the default gets the fast path and the pyogrio-specific
        # unknown-option validation.
        passthrough: dict[str, Any] = {
            "driver": resolved,
            "mode": mode,
            "engine": "pyogrio",
        }
        if layer is not None:
            passthrough["layer"] = layer
        passthrough.update(creation_options)
        super().to_file(path, **passthrough)

    # FeatureCollection.to_dataset was moved to
    # Dataset.from_features(features,...) to break the circular import
    # that used to force a CLAUDE.md-violating inline
    # `from pyramids.dataset import Dataset` inside the method body.
    # Callers should migrate:
    # fc.to_dataset(dataset=ds, column_name="pop")
    # → Dataset.from_features(fc, template=ds, column_name="pop")
    # fc.to_dataset(cell_size=10)
    # → Dataset.from_features(fc, cell_size=10)

    def explode(self, geometry: str = "multipolygon") -> FeatureCollection:
        """Explode multi-geometry rows into per-row single geometries.

        Returns a new ``FeatureCollection`` where every row whose geometry
        type matches ``geometry`` is split so each child geometry becomes
        its own row. The current frame is not mutated.

        Args:
            geometry (str): The geometry type to explode (case-insensitive).
                Defaults to ``"multipolygon"``.

        Returns:
            FeatureCollection: A new collection with the same CRS as
            ``self`` and exploded geometries.

        Examples:
            - Explode a frame mixing one MultiPolygon with a Polygon:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Polygon, MultiPolygon
                >>> from pyramids.feature import FeatureCollection
                >>> gdf = gpd.GeoDataFrame(
                ...     {
                ...         "name": ["a", "b"],
                ...         "geometry": [
                ...             MultiPolygon([
                ...                 Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]),
                ...                 Polygon([(5, 5), (7, 5), (7, 7), (5, 7)]),
                ...             ]),
                ...             Polygon([(10, 10), (11, 10), (11, 11), (10, 11)]),
                ...         ],
                ...     },
                ...     crs="EPSG:4326",
                ... )
                >>> fc = FeatureCollection(gdf)
                >>> result = fc.explode("multipolygon")
                >>> len(result)
                3
                >>> [g.geom_type for g in result.geometry]
                ['Polygon', 'Polygon', 'Polygon']

                ```
        """
        return FeatureCollection(_geom.explode_gdf(self, geometry=geometry))

    def with_coordinates(self) -> FeatureCollection:
        """Return a new FeatureCollection with per-vertex `x` and `y` columns.

        non-mutating replacement for the old `xy()` method
        (which has been deleted). Matches pandas / geopandas
        convention — data-transformation methods return a new object.
        The `with_` prefix follows the stdlib/pandas pattern for
        "return a copy with this change applied" (e.g.
        :meth:`pathlib.Path.with_suffix`).

        Explodes MultiPolygon and GeometryCollection geometries into
        their parts first, then attaches `x` and `y` columns
        containing the coordinate sequences of each row.

        Returns:
            FeatureCollection: A new FeatureCollection (`self` is
            not modified) with the original columns plus `x` and
            `y` per-vertex coordinate lists.

        Examples:
            - A Point FC gets scalar `x` / `y` per row:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1, 2]},
                ...         geometry=[Point(1.0, 2.0), Point(3.0, 4.0)],
                ...         crs="EPSG:4326",
                ...     )
                ... )
                >>> out = fc.with_coordinates()
                >>> list(out["x"])
                [1.0, 3.0]
                >>> list(out["y"])
                [2.0, 4.0]

                ```
            - The input FC is not mutated:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1]}, geometry=[Point(0.0, 0.0)],
                ...         crs="EPSG:4326",
                ...     )
                ... )
                >>> _ = fc.with_coordinates()
                >>> "x" in fc.columns
                False

                ```
        """
        gdf = _geom.explode_gdf(
            gpd.GeoDataFrame(self, copy=True), geometry="multipolygon"
        )
        gdf = _geom.explode_gdf(gdf, geometry="geometrycollection")

        fc = FeatureCollection(gdf)
        fc["x"] = fc.apply(
            _geom.get_coords, geom_col="geometry", coord_type="x", axis=1
        )
        fc["y"] = fc.apply(
            _geom.get_coords, geom_col="geometry", coord_type="y", axis=1
        )
        fc.reset_index(drop=True, inplace=True)
        return fc

    def plot(
        self,
        column: str | None = None,
        basemap: bool | str | None = None,
        **kwargs: Any,
    ) -> Any:
        """Plot features, optionally on a web-tile basemap.

        Delegates to :meth:`geopandas.GeoDataFrame.plot` and, when
        `basemap` is truthy, adds an OSM (or named provider) tile
        layer underneath.

        Raises:
            ValueError: If `basemap` is requested but the FC has no CRS.
        """
        ax = super().plot(column=column, **kwargs)

        if basemap:
            if self.epsg is None:
                raise CRSError(
                    "FeatureCollection must have a CRS (epsg) to use basemap."
                )
            source = basemap if isinstance(basemap, str) else None
            add_basemap(ax, crs=self.epsg, source=source)

        return ax

    def concat(self, other: GeoDataFrame) -> FeatureCollection:
        """Concatenate another GeoDataFrame onto this FeatureCollection.

        mirrors :func:`pandas.concat` — returns a new
        `FeatureCollection` and never mutates `self`. No
        `inplace` kwarg (pandas' `pd.concat` has never had one;
        follow the convention).

        Equivalent to `pd.concat([fc, other])` which also works
        directly and returns a `FeatureCollection` via the
        `_constructor` hook.

        a CRS mismatch between `self` and `other` raises
        :class:`pyramids.base._errors.CRSError`. The old behaviour
        silently adopted `self`'s CRS — which corrupted the
        `other` rows' coordinates if the two frames were in
        different CRSes. Callers that want to force-concat across
        CRSes must `other.to_crs(self.crs)` first. An
        unset-on-one-side case (one CRS is `None`) is permitted so
        you can seed a CRS by concatenating a CRS-carrying frame
        onto a freshly-constructed empty FC.

        Args:
            other (GeoDataFrame): The rows to append.

        Returns:
            FeatureCollection: A new FC containing `self`'s rows
            followed by `other`'s rows, with `self`'s CRS and a
            freshly-reset index.

        Raises:
            CRSError: If both frames carry a CRS and the two CRSes
                do not match.

        Examples:
            - Concatenate two single-row FCs on matching CRS:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> a = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1]}, geometry=[Point(0, 0)],
                ...         crs="EPSG:4326",
                ...     )
                ... )
                >>> b = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [2]}, geometry=[Point(1, 1)],
                ...         crs="EPSG:4326",
                ...     )
                ... )
                >>> out = a.concat(b)
                >>> len(out)
                2
                >>> list(out["id"])
                [1, 2]
                >>> out.crs.to_epsg()
                4326

                ```
            - CRS mismatch raises `CRSError`:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> a = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1]}, geometry=[Point(0, 0)],
                ...         crs="EPSG:4326",
                ...     )
                ... )
                >>> b = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [2]}, geometry=[Point(1, 1)],
                ...         crs="EPSG:3857",
                ...     )
                ... )
                >>> a.concat(b)
                Traceback (most recent call last):
                    ...
                pyramids.base._errors.CRSError: concat: CRS mismatch...

                ```
        """
        # validate CRS agreement up front.
        if self.crs is not None and other.crs is not None:
            if self.crs != other.crs:
                raise CRSError(
                    f"concat: CRS mismatch — self.crs = {self.crs!r}, "
                    f"other.crs = {other.crs!r}. Reproject one side "
                    f"— `other.to_crs(self.crs)` OR "
                    f"`self.to_crs(other.crs)` — before "
                    f"concatenating, or strip one CRS with "
                    f".set_crs(None, allow_override=True)."
                )
        combined = gpd.GeoDataFrame(pd.concat([self, other]))
        combined.index = list(range(len(combined)))
        combined.crs = self.crs if self.crs is not None else other.crs
        return FeatureCollection(combined)

    def with_centroid(self) -> FeatureCollection:
        """Return a new FC with per-feature center-point columns attached.

        non-mutating replacement for the old `center_point()`
        method (which has been deleted). The `with_` prefix mirrors
        stdlib / pandas conventions for "return a copy with this
        change applied".

        Computes average x/y per feature (after
        :meth:`with_coordinates`) and attaches three columns:
        `avg_x`, `avg_y` and `center_point` (shapely `Point`).

        feeding a degenerate or empty geometry (for example an
        empty `Point`, or a `Polygon` whose ring has zero area)
        produces `(NaN, NaN)` averages. The method emits a single
        `UserWarning` listing the row indices whose `avg_x` /
        `avg_y` could not be computed so downstream code can guard
        against the NaN centroids instead of silently consuming them.
        The `center_point` value at those rows is an empty
        `shapely.Point` (`Point.is_empty is True`) rather than a
        `(NaN, NaN)` point.

        Returns:
            FeatureCollection: A new FeatureCollection (`self` is
            not modified) with `x`, `y`, `avg_x`, `avg_y`,
            `center_point` columns added.

        Examples:
            - Compute centroids for a 2-polygon FC:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Polygon
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1, 2]},
                ...         geometry=[
                ...             Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]),
                ...             Polygon([(4, 4), (6, 4), (6, 6), (4, 6)]),
                ...         ],
                ...         crs="EPSG:4326",
                ...     )
                ... )
                >>> out = fc.with_centroid()
                >>> [(p.x, p.y) for p in out["center_point"]]
                [(0.8, 0.8), (4.8, 4.8)]

                ```
            - A Point FC is a no-op for the coordinate lists (each row
              is already a single vertex); the centroid equals the point:
                ```python
                >>> import geopandas as gpd
                >>> from shapely.geometry import Point
                >>> from pyramids.feature import FeatureCollection
                >>> fc = FeatureCollection(
                ...     gpd.GeoDataFrame(
                ...         {"id": [1, 2]},
                ...         geometry=[Point(3.0, 4.0), Point(7.0, 8.0)],
                ...         crs="EPSG:4326",
                ...     )
                ... )
                >>> out = fc.with_centroid()
                >>> [(p.x, p.y) for p in out["center_point"]]
                [(3.0, 4.0), (7.0, 8.0)]

                ```
        """
        fc = self.with_coordinates()
        for i, row_i in fc.iterrows():
            fc.loc[i, "avg_x"] = np.mean(row_i["x"])
            fc.loc[i, "avg_y"] = np.mean(row_i["y"])

        # detect rows whose averaged coordinate could not be
        # computed (empty geometry, all-NaN rings, etc.). Emit a single
        # summary warning and substitute an empty Point so the column
        # does not expose a `(NaN, NaN)` Point that would then crash
        # downstream reprojections.
        avg_x = fc["avg_x"].to_numpy()
        avg_y = fc["avg_y"].to_numpy()
        bad_mask = np.isnan(avg_x) | np.isnan(avg_y)
        if bad_mask.any():
            bad_idx = [int(i) for i, is_bad in enumerate(bad_mask) if is_bad]
            warnings.warn(
                f"with_centroid: {len(bad_idx)} row(s) yielded NaN centroids "
                f"(rows {bad_idx}). Their `center_point` is an empty "
                f"shapely.Point. Drop or repair those rows before running "
                f"a method that requires a valid centroid (e.g. reproject, "
                f"distance).",
                GeometryWarning,
                stacklevel=2,
            )

        # single-pass build. The previous implementation built a
        # throwaway `coords_list` (with NaN placeholders for the bad
        # rows), called `create_points` on it, then iterated the
        # result a second time to substitute empty Points for the bad
        # rows. Skip both intermediates — write the final column value
        # directly.
        cleaned: list[Any] = [
            Point() if bad else Point(ax, ay)
            for ax, ay, bad in zip(avg_x.tolist(), avg_y.tolist(), bad_mask.tolist())
        ]
        fc["center_point"] = cleaned
        return fc

epsg property #

EPSG code of this FeatureCollection's CRS (cached).

The value is cached per CRS-object identity so repeated access on hot paths skips the pyproj.CRS.to_epsg call. The cache auto-invalidates whenever self.crs is replaced.

identity-miss falls back to equality. If self.crs has been reassigned to a different CRS object that nevertheless compares equal to the cached one (e.g. fc.crs = pyproj.CRS( "EPSG:4326") on a frame already in EPSG:4326), we adopt the new object as the cache key and skip the .to_epsg() call. Only when the value really differs do we recompute.

the equality fallback is cheaper than a fresh .to_epsg() (which re-parses the CRS) but it is not free — pyproj.CRS.__eq__ does a WKT2 string comparison. If a future pandas/geopandas release stops returning the same self.crs object identity across accesses, the fallback runs on every fc.epsg and adds up on hot loops. Switch the cache key to self.crs.to_wkt() if a profile ever shows this dominating.

Returns:

Type Description
int | None

int | None: The integer EPSG code if the CRS is registered

int | None

in the EPSG authority; None when the FC has no CRS set

int | None

or when its CRS cannot be mapped to a single EPSG code.

Examples:

  • Frame built with WGS84 reports EPSG 4326:
    >>> import geopandas as gpd
    >>> from shapely.geometry import Point
    >>> from pyramids.feature import FeatureCollection
    >>> fc = FeatureCollection(
    ...     gpd.GeoDataFrame(
    ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
    ...     )
    ... )
    >>> fc.epsg
    4326
    
  • A frame without a CRS returns None:
    >>> import geopandas as gpd
    >>> from shapely.geometry import Point
    >>> from pyramids.feature import FeatureCollection
    >>> fc = FeatureCollection(
    ...     gpd.GeoDataFrame({"id": [1]}, geometry=[Point(0, 0)])
    ... )
    >>> fc.epsg is None
    True
    
  • Reprojecting to Web Mercator updates the cached code:
    >>> import geopandas as gpd
    >>> from shapely.geometry import Point
    >>> from pyramids.feature import FeatureCollection
    >>> fc = FeatureCollection(
    ...     gpd.GeoDataFrame(
    ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
    ...     )
    ... )
    >>> fc = fc.to_crs(3857)
    >>> fc.epsg
    3857
    

top_left_corner property #

Top-left corner [xmin, ymax] of the total bounds.

Returns:

Type Description
list[Number]

list[Number]: Two-element list [xmin, ymax] — the

list[Number]

minimum x-coordinate paired with the maximum y-coordinate

list[Number]

of the union of all geometry bounds.

Examples:

  • Two points span a unit square — the top-left is [0, 1]:
    >>> import geopandas as gpd
    >>> from shapely.geometry import Point
    >>> from pyramids.feature import FeatureCollection
    >>> fc = FeatureCollection(
    ...     gpd.GeoDataFrame(
    ...         {"id": [1, 2]},
    ...         geometry=[Point(0, 0), Point(1, 1)],
    ...         crs="EPSG:4326",
    ...     )
    ... )
    >>> fc.top_left_corner
    [0.0, 1.0]
    
  • Offset points yield the offset top-left corner:
    >>> import geopandas as gpd
    >>> from shapely.geometry import Point
    >>> from pyramids.feature import FeatureCollection
    >>> fc = FeatureCollection(
    ...     gpd.GeoDataFrame(
    ...         {"id": [1, 2]},
    ...         geometry=[Point(10, 20), Point(15, 30)],
    ...         crs="EPSG:4326",
    ...     )
    ... )
    >>> fc.top_left_corner
    [10.0, 30.0]
    

column property #

Deprecated alias for :attr:columns returning a list[str].

Returns:

Type Description
list[str]

list[str]: Column names in their current order, including

list[str]

the active geometry column.

Examples:

  • A frame with an id field reports both columns:
    >>> import geopandas as gpd
    >>> from shapely.geometry import Point
    >>> from pyramids.feature import FeatureCollection
    >>> fc = FeatureCollection(
    ...     gpd.GeoDataFrame(
    ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
    ...     )
    ... )
    >>> fc.column
    ['id', 'geometry']
    
  • Multiple attribute columns appear in insertion order:
    >>> import geopandas as gpd
    >>> from shapely.geometry import Point
    >>> from pyramids.feature import FeatureCollection
    >>> fc = FeatureCollection(
    ...     gpd.GeoDataFrame(
    ...         {"name": ["a"], "pop": [100]},
    ...         geometry=[Point(0, 0)],
    ...         crs="EPSG:4326",
    ...     )
    ... )
    >>> fc.column
    ['name', 'pop', 'geometry']
    

schema property #

Fiona-style schema: geometry type + field-type dict.

Returns a dict shaped like fiona's schema attribute so callers migrating from fiona.open(path).schema can consume this without rewriting. The dict has three keys:

  • "geometry": single string ("Point", "Polygon", …) when every row has the same geom type, otherwise "Unknown".
  • "properties": {column_name: dtype_string} for every non-geometry column.
  • "crs": the :attr:crs as a :class:pyproj.CRS object, or None when the FC has no CRS set. Matches fiona's convention — callers migrating from fiona.open(path).schema['crs'] can consume it directly.

Empty FeatureCollections (len(self) == 0) report "Unknown" for the geometry type.

Returns:

Name Type Description
dict dict

Three-key dict with "geometry", "properties",

dict

and "crs".

Examples:

  • Homogeneous point collection reports "Point":
    >>> import geopandas as gpd
    >>> from shapely.geometry import Point
    >>> from pyramids.feature import FeatureCollection
    >>> fc = FeatureCollection(
    ...     gpd.GeoDataFrame(
    ...         {"id": [1, 2]},
    ...         geometry=[Point(0, 0), Point(1, 1)],
    ...         crs="EPSG:4326",
    ...     )
    ... )
    >>> schema = fc.schema
    >>> schema["geometry"]
    'Point'
    >>> schema["properties"]
    {'id': 'int64'}
    >>> schema["crs"].to_epsg()
    4326
    
  • Mixed geometry types collapse to "Unknown":
    >>> import geopandas as gpd
    >>> from shapely.geometry import Point, LineString
    >>> from pyramids.feature import FeatureCollection
    >>> fc = FeatureCollection(
    ...     gpd.GeoDataFrame(
    ...         {"id": [1, 2]},
    ...         geometry=[Point(0, 0), LineString([(0, 0), (1, 1)])],
    ...         crs="EPSG:4326",
    ...     )
    ... )
    >>> fc.schema["geometry"]
    'Unknown'
    
  • Frames without a CRS return crs=None:
    >>> import geopandas as gpd
    >>> from shapely.geometry import Point
    >>> from pyramids.feature import FeatureCollection
    >>> fc = FeatureCollection(
    ...     gpd.GeoDataFrame({"id": [1]}, geometry=[Point(0, 0)])
    ... )
    >>> fc.schema["crs"] is None
    True
    

__init__(data=None, *args, **kwargs) #

Construct a FeatureCollection.

Accepts anything :class:geopandas.GeoDataFrame accepts. Rejects ogr.DataSource / gdal.Dataset with a clear error .

Source code in src/pyramids/feature/collection.py
def __init__(self, data: Any = None, *args: Any, **kwargs: Any) -> None:
    """Construct a FeatureCollection.

    Accepts anything :class:`geopandas.GeoDataFrame` accepts.
    Rejects `ogr.DataSource` / `gdal.Dataset` with a clear error
    .
    """
    if isinstance(data, (ogr.DataSource, gdal.Dataset)):
        raise TypeError(
            "FeatureCollection no longer accepts ogr.DataSource or "
            "gdal.Dataset objects. OGR is an internal implementation "
            "detail. Use FeatureCollection.read_file(path) to load a "
            "file, or pass a GeoDataFrame."
        )
    super().__init__(data, *args, **kwargs)

__enter__() #

Enter a context-managed block. Returns self.

Returns:

Name Type Description
FeatureCollection FeatureCollection

self — the exact same instance, so

FeatureCollection

with... as fc: binds fc to this collection.

Examples:

  • Use as a context manager and access rows inside the block:
    >>> import geopandas as gpd
    >>> from shapely.geometry import Point
    >>> from pyramids.feature import FeatureCollection
    >>> gdf = gpd.GeoDataFrame(
    ...     {"id": [1, 2]},
    ...     geometry=[Point(0, 0), Point(1, 1)],
    ...     crs="EPSG:4326",
    ... )
    >>> with FeatureCollection(gdf) as fc:
    ...     n = len(fc)
    >>> n
    2
    
  • Exceptions raised inside the block still propagate:
    >>> import geopandas as gpd
    >>> from shapely.geometry import Point
    >>> from pyramids.feature import FeatureCollection
    >>> fc = FeatureCollection(
    ...     gpd.GeoDataFrame(
    ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
    ...     )
    ... )
    >>> try:
    ...     with fc:
    ...         raise RuntimeError("boom")
    ... except RuntimeError as err:
    ...     print(err)
    boom
    
Source code in src/pyramids/feature/collection.py
def __enter__(self) -> FeatureCollection:
    """Enter a context-managed block. Returns `self`.

    Returns:
        FeatureCollection: `self` — the exact same instance, so
        `with... as fc:` binds `fc` to this collection.

    Examples:
        - Use as a context manager and access rows inside the block:
            ```python
            >>> import geopandas as gpd
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> gdf = gpd.GeoDataFrame(
            ...     {"id": [1, 2]},
            ...     geometry=[Point(0, 0), Point(1, 1)],
            ...     crs="EPSG:4326",
            ... )
            >>> with FeatureCollection(gdf) as fc:
            ...     n = len(fc)
            >>> n
            2

            ```
        - Exceptions raised inside the block still propagate:
            ```python
            >>> import geopandas as gpd
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> fc = FeatureCollection(
            ...     gpd.GeoDataFrame(
            ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
            ...     )
            ... )
            >>> try:
            ...     with fc:
            ...         raise RuntimeError("boom")
            ... except RuntimeError as err:
            ...     print(err)
            boom

            ```
    """
    return self

__exit__(exc_type, exc, tb) #

Exit the context-managed block. Calls :meth:close.

Parameters:

Name Type Description Default
exc_type

Exception class if the block raised, else None.

required
exc

Exception instance if the block raised, else None.

required
tb

Traceback for the raised exception, else None.

required

Returns:

Name Type Description
bool bool

Always False — exceptions from inside the with

bool

block propagate to the caller rather than being swallowed.

Examples:

  • The clean-exit path returns False so nothing is swallowed:
    >>> import geopandas as gpd
    >>> from shapely.geometry import Point
    >>> from pyramids.feature import FeatureCollection
    >>> fc = FeatureCollection(
    ...     gpd.GeoDataFrame(
    ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
    ...     )
    ... )
    >>> fc.__exit__(None, None, None)
    False
    
  • A with block that finishes normally just releases the FC:
    >>> import geopandas as gpd
    >>> from shapely.geometry import Point
    >>> from pyramids.feature import FeatureCollection
    >>> gdf = gpd.GeoDataFrame(
    ...     {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
    ... )
    >>> with FeatureCollection(gdf) as fc:
    ...     pass
    >>> len(fc)
    1
    
Source code in src/pyramids/feature/collection.py
def __exit__(self, exc_type, exc, tb) -> bool:
    """Exit the context-managed block. Calls :meth:`close`.

    Args:
        exc_type: Exception class if the block raised, else `None`.
        exc: Exception instance if the block raised, else `None`.
        tb: Traceback for the raised exception, else `None`.

    Returns:
        bool: Always `False` — exceptions from inside the `with`
        block propagate to the caller rather than being swallowed.

    Examples:
        - The clean-exit path returns `False` so nothing is swallowed:
            ```python
            >>> import geopandas as gpd
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> fc = FeatureCollection(
            ...     gpd.GeoDataFrame(
            ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
            ...     )
            ... )
            >>> fc.__exit__(None, None, None)
            False

            ```
        - A `with` block that finishes normally just releases the FC:
            ```python
            >>> import geopandas as gpd
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> gdf = gpd.GeoDataFrame(
            ...     {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
            ... )
            >>> with FeatureCollection(gdf) as fc:
            ...     pass
            >>> len(fc)
            1

            ```
    """
    self.close()
    return False

close() #

Release resources held by this FeatureCollection.

No-op today (the OGR bridge is self-cleaning). Exists so future resource-holding features have an idiomatic release point.

Returns:

Name Type Description
None None

This method does not return a value.

Examples:

  • close() is idempotent — calling it repeatedly is safe:
    >>> import geopandas as gpd
    >>> from shapely.geometry import Point
    >>> from pyramids.feature import FeatureCollection
    >>> fc = FeatureCollection(
    ...     gpd.GeoDataFrame(
    ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
    ...     )
    ... )
    >>> fc.close()
    >>> fc.close()
    >>> len(fc)
    1
    
  • The collection remains usable after close (no-op today):
    >>> import geopandas as gpd
    >>> from shapely.geometry import Point
    >>> from pyramids.feature import FeatureCollection
    >>> fc = FeatureCollection(
    ...     gpd.GeoDataFrame(
    ...         {"v": [7]}, geometry=[Point(2, 3)], crs="EPSG:4326",
    ...     )
    ... )
    >>> fc.close()
    >>> fc.epsg
    4326
    
Source code in src/pyramids/feature/collection.py
def close(self) -> None:
    """Release resources held by this FeatureCollection.

    No-op today (the OGR bridge is self-cleaning). Exists so future
    resource-holding features have an idiomatic release point.

    Returns:
        None: This method does not return a value.

    Examples:
        - `close()` is idempotent — calling it repeatedly is safe:
            ```python
            >>> import geopandas as gpd
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> fc = FeatureCollection(
            ...     gpd.GeoDataFrame(
            ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
            ...     )
            ... )
            >>> fc.close()
            >>> fc.close()
            >>> len(fc)
            1

            ```
        - The collection remains usable after `close` (no-op today):
            ```python
            >>> import geopandas as gpd
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> fc = FeatureCollection(
            ...     gpd.GeoDataFrame(
            ...         {"v": [7]}, geometry=[Point(2, 3)], crs="EPSG:4326",
            ...     )
            ... )
            >>> fc.close()
            >>> fc.epsg
            4326

            ```
    """
    return None

from_features(features, *, crs=None, columns=None) classmethod #

Build a FeatureCollection from feature-shaped inputs.

Delegates to :meth:geopandas.GeoDataFrame.from_features and wraps the result. Accepts any of the shapes that method accepts:

  • a list (or iterator) of GeoJSON feature dicts of the form {"type": "Feature", "geometry": {...}, "properties": {...}},
  • any object exposing __geo_interface__ (shapely geometries, fiona records, custom feature classes), or
  • a bare FeatureCollection dict ({"type": "FeatureCollection", "features": [...]}).

Parameters:

Name Type Description Default
features Iterable

Feature dicts of the form {"type": "Feature", "geometry": {...}, "properties": {...}}, or any __geo_interface__ provider. Also accepts a bare FeatureCollection dict.

required
crs Any

CRS to attach to the result (EPSG int, "EPSG:4326", WKT, Proj, or a :class:pyproj.CRS). None leaves the CRS unset.

None
columns list[str] | None

Explicit column order for the output. When None, geopandas infers columns from the first feature.

None

Returns:

Name Type Description
FeatureCollection FeatureCollection

A new FC backed by the supplied features.

Raises:

Type Description
ValueError

If features is empty or exhausted before any feature is consumed. An empty GeoDataFrame from from_features has no geometry column, which breaks downstream pyramids methods that assume the column exists. Fail fast instead.

Examples:

  • Build from a list of feature dicts:
    >>> from pyramids.feature import FeatureCollection
    >>> feats = [
    ...     {"type": "Feature",
    ...      "geometry": {"type": "Point", "coordinates": [0, 0]},
    ...      "properties": {"name": "a"}},
    ...     {"type": "Feature",
    ...      "geometry": {"type": "Point", "coordinates": [1, 1]},
    ...      "properties": {"name": "b"}},
    ... ]
    >>> fc = FeatureCollection.from_features(feats, crs=4326)
    >>> len(fc)
    2
    >>> fc.epsg
    4326
    
Source code in src/pyramids/feature/collection.py
@classmethod
def from_features(
    cls,
    features: Iterable[Any],
    *,
    crs: Any = None,
    columns: list[str] | None = None,
) -> FeatureCollection:
    """Build a FeatureCollection from feature-shaped inputs.

    Delegates to :meth:`geopandas.GeoDataFrame.from_features` and
    wraps the result. Accepts any of the shapes that method
    accepts:

    * a list (or iterator) of GeoJSON feature dicts of the form
      `{"type": "Feature", "geometry": {...}, "properties": {...}}`,
    * any object exposing `__geo_interface__` (shapely
      geometries, fiona records, custom feature classes), or
    * a bare `FeatureCollection` dict (`{"type":
      "FeatureCollection", "features": [...]}`).

    Args:
        features (Iterable):
            Feature dicts of the form
            `{"type": "Feature", "geometry": {...}, "properties": {...}}`,
            or any `__geo_interface__` provider. Also accepts a
            bare `FeatureCollection` dict.
        crs:
            CRS to attach to the result (EPSG int, `"EPSG:4326"`,
            WKT, Proj, or a :class:`pyproj.CRS`). `None` leaves
            the CRS unset.
        columns (list[str] | None):
            Explicit column order for the output. When `None`,
            geopandas infers columns from the first feature.

    Returns:
        FeatureCollection: A new FC backed by the supplied features.

    Raises:
        ValueError: If `features` is empty or exhausted before any
            feature is consumed. An empty GeoDataFrame from
            `from_features` has no `geometry` column, which
            breaks downstream pyramids methods that assume the
            column exists. Fail fast instead.

    Examples:
        - Build from a list of feature dicts:
            ```python
            >>> from pyramids.feature import FeatureCollection
            >>> feats = [
            ...     {"type": "Feature",
            ...      "geometry": {"type": "Point", "coordinates": [0, 0]},
            ...      "properties": {"name": "a"}},
            ...     {"type": "Feature",
            ...      "geometry": {"type": "Point", "coordinates": [1, 1]},
            ...      "properties": {"name": "b"}},
            ... ]
            >>> fc = FeatureCollection.from_features(feats, crs=4326)
            >>> len(fc)
            2
            >>> fc.epsg
            4326

            ```
    """
    # materialise an iterator so we can detect the empty case
    # before handing off to geopandas. `geopandas.from_features([])`
    # returns a GeoDataFrame with no `geometry` column, which
    # breaks every pyramids op that assumes the column exists.
    features_list = list(features)
    if not features_list:
        raise ValueError(
            "from_features requires at least one feature. An empty "
            "iterable would produce a GeoDataFrame with no geometry "
            "column, which breaks downstream pyramids methods."
        )
    gdf = gpd.GeoDataFrame.from_features(features_list, crs=crs, columns=columns)
    return cls(gdf)

from_bbox(bbox, *, epsg) classmethod #

Build a one-row FeatureCollection from a geographic bounding box.

The bbox is the canonical (west, south, east, north) quadruple in the CRS named by epsg. The result is a single-row FC whose only geometry is a rectangular Polygon — handy for cropping a raster or windowed-reading it without writing out the polygon vertices by hand:

.. code-block:: python

mask = FeatureCollection.from_bbox((31.0, 30.0, 31.1, 30.1), epsg=4326)
cropped = dataset.crop(mask)

Most callers do not need to build this themselves — :meth:Dataset.crop and :meth:Dataset.read_array (via :meth:pyramids.dataset.engines.io.IO.read_array) accept the bbox/epsg pair directly and call this helper internally.

Parameters:

Name Type Description Default
bbox tuple[float, float, float, float] | list[float]

A 4-element (west, south, east, north) tuple / list of numbers. Must satisfy west < east and south < north.

required
epsg Any

CRS for the bbox coordinates — anything geopandas accepts for crs= (EPSG int such as 4326, "EPSG:4326" string, WKT, Proj, or a :class:pyproj.CRS). Required (a bbox without a CRS is ambiguous).

required

Returns:

Name Type Description
FeatureCollection FeatureCollection

A one-row FC carrying the rectangular polygon,

FeatureCollection

in the supplied CRS.

Raises:

Type Description
ValueError

bbox is not a 4-element sequence, or violates west < east / south < north, or epsg is None.

TypeError

bbox elements are not numbers.

Examples:

  • Build a one-row FC from a bbox and inspect it:
    >>> from pyramids.feature import FeatureCollection
    >>> fc = FeatureCollection.from_bbox((31.0, 30.0, 31.1, 30.1), epsg=4326)
    >>> len(fc)
    1
    >>> tuple(float(v) for v in fc.total_bounds)
    (31.0, 30.0, 31.1, 30.1)
    >>> fc.crs.to_epsg()
    4326
    
  • Use it as a mask to crop a raster:
    >>> import numpy as np
    >>> from pyramids.dataset import Dataset
    >>> from pyramids.feature import FeatureCollection
    >>> arr = np.arange(100, dtype="int16").reshape(10, 10)
    >>> ds = Dataset.create_from_array(
    ...     arr, top_left_corner=(0, 0), cell_size=0.05, epsg=4326,
    ... )
    >>> fc = FeatureCollection.from_bbox((0.1, -0.2, 0.2, -0.1), epsg=4326)
    >>> ds.crop(mask=fc).shape
    (1, 2, 2)
    
  • epsg=None is rejected — a bbox without a CRS is ambiguous:
    >>> from pyramids.feature import FeatureCollection
    >>> try:
    ...     FeatureCollection.from_bbox((0, 0, 1, 1), epsg=None)
    ... except ValueError as exc:
    ...     print("epsg" in str(exc))
    True
    
See Also
  • :meth:pyramids.dataset.engines.spatial.Spatial.crop: accepts bbox= / epsg= directly and routes through this helper.
  • :meth:pyramids.dataset.engines.io.IO.read_array: same.
Source code in src/pyramids/feature/collection.py
@classmethod
def from_bbox(
    cls,
    bbox: tuple[float, float, float, float] | list[float],
    *,
    epsg: Any,
) -> FeatureCollection:
    """Build a one-row FeatureCollection from a geographic bounding box.

    The bbox is the canonical ``(west, south, east, north)`` quadruple in
    the CRS named by ``epsg``. The result is a single-row FC whose only
    geometry is a rectangular Polygon — handy for cropping a raster or
    windowed-reading it without writing out the polygon vertices by hand:

    .. code-block:: python

        mask = FeatureCollection.from_bbox((31.0, 30.0, 31.1, 30.1), epsg=4326)
        cropped = dataset.crop(mask)

    Most callers do not need to build this themselves — :meth:`Dataset.crop`
    and :meth:`Dataset.read_array` (via :meth:`pyramids.dataset.engines.io.IO.read_array`)
    accept the bbox/``epsg`` pair directly and call this helper internally.

    Args:
        bbox: A 4-element ``(west, south, east, north)`` tuple / list of
            numbers. Must satisfy ``west < east`` and ``south < north``.
        epsg: CRS for the bbox coordinates — anything ``geopandas`` accepts
            for ``crs=`` (EPSG int such as ``4326``, ``"EPSG:4326"`` string,
            WKT, Proj, or a :class:`pyproj.CRS`). Required (a bbox without
            a CRS is ambiguous).

    Returns:
        FeatureCollection: A one-row FC carrying the rectangular polygon,
        in the supplied CRS.

    Raises:
        ValueError: ``bbox`` is not a 4-element sequence, or violates
            ``west < east`` / ``south < north``, or ``epsg`` is ``None``.
        TypeError: ``bbox`` elements are not numbers.

    Examples:
        - Build a one-row FC from a bbox and inspect it:
            ```python
            >>> from pyramids.feature import FeatureCollection
            >>> fc = FeatureCollection.from_bbox((31.0, 30.0, 31.1, 30.1), epsg=4326)
            >>> len(fc)
            1
            >>> tuple(float(v) for v in fc.total_bounds)
            (31.0, 30.0, 31.1, 30.1)
            >>> fc.crs.to_epsg()
            4326

            ```
        - Use it as a mask to crop a raster:
            ```python
            >>> import numpy as np
            >>> from pyramids.dataset import Dataset
            >>> from pyramids.feature import FeatureCollection
            >>> arr = np.arange(100, dtype="int16").reshape(10, 10)
            >>> ds = Dataset.create_from_array(
            ...     arr, top_left_corner=(0, 0), cell_size=0.05, epsg=4326,
            ... )
            >>> fc = FeatureCollection.from_bbox((0.1, -0.2, 0.2, -0.1), epsg=4326)
            >>> ds.crop(mask=fc).shape
            (1, 2, 2)

            ```
        - ``epsg=None`` is rejected — a bbox without a CRS is ambiguous:
            ```python
            >>> from pyramids.feature import FeatureCollection
            >>> try:
            ...     FeatureCollection.from_bbox((0, 0, 1, 1), epsg=None)
            ... except ValueError as exc:
            ...     print("epsg" in str(exc))
            True

            ```

    See Also:
        - :meth:`pyramids.dataset.engines.spatial.Spatial.crop`: accepts
          ``bbox=`` / ``epsg=`` directly and routes through this helper.
        - :meth:`pyramids.dataset.engines.io.IO.read_array`: same.
    """
    if epsg is None:
        raise ValueError(
            "from_bbox requires an explicit epsg= for the bbox CRS; "
            "a bbox without a CRS is ambiguous"
        )
    try:
        seq = list(bbox)
    except TypeError as exc:
        raise ValueError(
            f"bbox must be a 4-element (west, south, east, north) sequence; "
            f"got {bbox!r}"
        ) from exc
    if len(seq) != 4:
        raise ValueError(
            f"bbox must have exactly 4 elements (west, south, east, north); "
            f"got {len(seq)}: {seq!r}"
        )
    try:
        w, s, e, n = (float(v) for v in seq)
    except (TypeError, ValueError) as exc:
        raise TypeError(f"bbox elements must be numbers; got {seq!r}") from exc
    if not (w < e):
        raise ValueError(f"bbox must satisfy west < east; got west={w}, east={e}")
    if not (s < n):
        raise ValueError(
            f"bbox must satisfy south < north; got south={s}, north={n}"
        )
    return cls(geometry=[box(w, s, e, n)], crs=epsg)

from_records(records, *, geometry='geometry', crs=None, orient='records') classmethod #

Build a FeatureCollection from dict records.

Two input orientations are accepted (C26 added the second):

  • orient="records" (default) — an iterable of per-row dicts, each of the form {column: value,..., geometry: <shapely>}. The dict's keys become column names; the key named by geometry must hold a shapely geometry.
  • orient="list" — a single columnar dict mapping each column name to a list of values of equal length, for example {"id": [1, 2], "geometry": [pt_a, pt_b]}.

Useful for ingesting rows from an API response that doesn't emit GeoJSON but already has shapely geoms.

Parameters:

Name Type Description Default
records Any

Per-row iterable of dicts when orient="records", or a single columnar dict when orient="list".

required
geometry str

Name of the column / key holding the shapely geometry. Default "geometry".

'geometry'
crs Any

CRS to attach (same forms as :meth:from_features).

None
orient str

"records" (default) or "list" — matches the pandas from_dict/from_records conventions.

'records'

Returns:

Name Type Description
FeatureCollection FeatureCollection

A new FC with one row per record.

Raises:

Type Description
FeatureError

If a record is missing the geometry column.

ValueError

If orient is not one of the supported values.

Examples:

  • Per-row records with the default geometry key:
    >>> from shapely.geometry import Point
    >>> from pyramids.feature import FeatureCollection
    >>> recs = [
    ...     {"id": 1, "geometry": Point(0, 0)},
    ...     {"id": 2, "geometry": Point(1, 1)},
    ... ]
    >>> fc = FeatureCollection.from_records(recs, crs=4326)
    >>> len(fc)
    2
    >>> fc.epsg
    4326
    
  • Custom geometry key via the geometry= kwarg:
    >>> from shapely.geometry import Point
    >>> from pyramids.feature import FeatureCollection
    >>> recs = [
    ...     {"id": 1, "geom": Point(0, 0)},
    ...     {"id": 2, "geom": Point(1, 1)},
    ... ]
    >>> fc = FeatureCollection.from_records(
    ...     recs, geometry="geom", crs=4326,
    ... )
    >>> fc.geometry.name
    'geom'
    
  • Columnar dict via orient="list":
    >>> from shapely.geometry import Point
    >>> from pyramids.feature import FeatureCollection
    >>> cols = {"id": [1, 2], "geometry": [Point(0, 0), Point(1, 1)]}
    >>> fc = FeatureCollection.from_records(
    ...     cols, orient="list", crs=4326,
    ... )
    >>> list(fc["id"])
    [1, 2]
    
Source code in src/pyramids/feature/collection.py
@classmethod
def from_records(
    cls,
    records: Any,
    *,
    geometry: str = "geometry",
    crs: Any = None,
    orient: str = "records",
) -> FeatureCollection:
    """Build a FeatureCollection from dict records.

    Two input orientations are accepted (C26 added the second):

    * `orient="records"` (default) — an iterable of per-row dicts,
      each of the form `{column: value,..., geometry: <shapely>}`.
      The dict's keys become column names; the key named by
      `geometry` must hold a shapely geometry.
    * `orient="list"` — a single columnar dict mapping each
      column name to a list of values of equal length, for
      example `{"id": [1, 2], "geometry": [pt_a, pt_b]}`.

    Useful for ingesting rows from an API response that doesn't
    emit GeoJSON but already has shapely geoms.

    Args:
        records:
            Per-row iterable of dicts when `orient="records"`, or a
            single columnar dict when `orient="list"`.
        geometry (str):
            Name of the column / key holding the shapely geometry.
            Default `"geometry"`.
        crs:
            CRS to attach (same forms as :meth:`from_features`).
        orient (str):
            `"records"` (default) or `"list"` — matches the
            pandas `from_dict`/`from_records` conventions.

    Returns:
        FeatureCollection: A new FC with one row per record.

    Raises:
        FeatureError: If a record is missing the `geometry`
            column.
        ValueError: If `orient` is not one of the supported
            values.

    Examples:
        - Per-row records with the default geometry key:
            ```python
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> recs = [
            ...     {"id": 1, "geometry": Point(0, 0)},
            ...     {"id": 2, "geometry": Point(1, 1)},
            ... ]
            >>> fc = FeatureCollection.from_records(recs, crs=4326)
            >>> len(fc)
            2
            >>> fc.epsg
            4326

            ```
        - Custom geometry key via the `geometry=` kwarg:
            ```python
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> recs = [
            ...     {"id": 1, "geom": Point(0, 0)},
            ...     {"id": 2, "geom": Point(1, 1)},
            ... ]
            >>> fc = FeatureCollection.from_records(
            ...     recs, geometry="geom", crs=4326,
            ... )
            >>> fc.geometry.name
            'geom'

            ```
        - Columnar dict via `orient="list"`:
            ```python
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> cols = {"id": [1, 2], "geometry": [Point(0, 0), Point(1, 1)]}
            >>> fc = FeatureCollection.from_records(
            ...     cols, orient="list", crs=4326,
            ... )
            >>> list(fc["id"])
            [1, 2]

            ```
    """

    # empty-input branches both build a single-column frame
    # whose column name matches the `geometry=` kwarg, so
    # `GeoDataFrame(..., geometry=…)` sets it as the active
    # geometry column and the returned FC has
    # `geometry.name == geometry`.
    def _empty_fc() -> FeatureCollection:
        return cls(gpd.GeoDataFrame({geometry: []}, geometry=geometry, crs=crs))

    if orient == "records":
        records_list = list(records)
        if not records_list:
            return _empty_fc()
        df = pd.DataFrame.from_records(records_list)
    elif orient == "list":
        # columnar dict of equal-length lists. Straight into
        # `pd.DataFrame` which accepts this shape natively and
        # raises `ValueError` on mismatched lengths (propagated
        # to the caller as-is — the pandas message is already clear).
        if not isinstance(records, dict):
            raise ValueError(
                f"orient='list' expects a dict of column → list; "
                f"got {type(records).__name__}."
            )
        df = pd.DataFrame(records)
        if len(df) == 0:
            return _empty_fc()
    else:
        raise ValueError(f"orient must be 'records' or 'list'; got {orient!r}.")
    if geometry not in df.columns:
        raise FeatureError(
            f"records missing required geometry column {geometry!r}; "
            f"columns present: {list(df.columns)}"
        )
    return cls(gpd.GeoDataFrame(df, geometry=geometry, crs=crs))

iter_features(path, *, layer=None, bbox=None, where=None, chunksize=None, tile_strategy='auto', include_index=False) classmethod #

Stream features from path without materializing the full file.

. Two orthogonal knobs:

  • Chunk shape. chunksize=None yields one GeoJSON-style dict per row (fiona idiom). chunksize=N yields :class:FeatureCollection batches of up to N rows each so batched pipelines get a DataFrame-shaped payload.
  • Tile strategy. Controls whether the bbox filter is pushed into the format's spatial index (rtree on GPKG, row-group statistics on Parquet, …) or applied after a full scan. Pass one of:

  • "auto" (default) — let pyogrio pick. For a GPKG, pyogrio queries the rtree_<layer>_geom companion table automatically. For a Parquet file, pyogrio / pyarrow push the bbox down to the row-group statistics and skip non-matching groups. For formats without a spatial index (GeoJSON, Shapefile without a .qix) this falls back to a full scan in the driver.

  • "rtree" — same as "auto"; kept as an explicit name so pipeline code can document intent.
  • "row_group" — same as "auto"; explicit name for the Parquet case.
  • "none" — disable index pushdown; read whole chunks from the driver and apply the bbox filter in Python. Useful when the on-disk spatial index is stale or suspected wrong; also exercises the "slow path" in tests.

bbox / where compose with any tile_strategy. Paths run through :func:pyramids._io._parse_path so cloud URLs and archive paths work the same way as in :meth:read_file.

Parameters:

Name Type Description Default
path str | Path

File path, URL, archive path.

required
layer str | int | None

Layer selector for multi-layer formats.

None
bbox tuple[float, float, float, float] | None

(minx, miny, maxx, maxy) filter.

None
where str | None

OGR SQL predicate.

None
chunksize int | None

None yields dicts, an int yields FeatureCollection chunks.

None
tile_strategy str

One of "auto", "rtree", "row_group", "none". Default "auto".

'auto'
include_index bool

When True, each yielded dict gets an additional "id" key whose value is the 0-based file-row index of that feature. The chunked form (chunksize=N) attaches the same index as a "_row_index" column on the yielded FC. The indices stay aligned with the on-disk rows even when a Python-side bbox filter (tile_strategy="none") drops some rows — only the surviving features are yielded, and their ids match the positions they had in the source file. Defaults to False for back-compat with the fiona idiom.

False

Yields:

Type Description
Any

dict | FeatureCollection: Per-feature dicts when

Any

chunksize is None; FeatureCollection chunks

Any

otherwise.

Raises:

Type Description
ValueError

If chunksize is given but < 1, or if tile_strategy is not one of the accepted values.

Examples:

  • Stream features one at a time as GeoJSON-style dicts:
    >>> import tempfile
    >>> from pathlib import Path
    >>> import geopandas as gpd
    >>> from shapely.geometry import Point
    >>> from pyramids.feature import FeatureCollection
    >>> d = Path(tempfile.mkdtemp())
    >>> path = d / "pts.geojson"
    >>> gdf = gpd.GeoDataFrame(
    ...     {"id": [1, 2, 3]},
    ...     geometry=[Point(0, 0), Point(1, 1), Point(2, 2)],
    ...     crs="EPSG:4326",
    ... )
    >>> gdf.to_file(path, driver="GeoJSON")
    >>> feats = list(FeatureCollection.iter_features(path))
    >>> len(feats)
    3
    >>> feats[0]["properties"]["id"]
    1
    
  • Stream in chunksize=2 batches as FeatureCollection chunks:
    >>> import tempfile
    >>> from pathlib import Path
    >>> import geopandas as gpd
    >>> from shapely.geometry import Point
    >>> from pyramids.feature import FeatureCollection
    >>> d = Path(tempfile.mkdtemp())
    >>> path = d / "pts.geojson"
    >>> gdf = gpd.GeoDataFrame(
    ...     {"id": [1, 2, 3]},
    ...     geometry=[Point(0, 0), Point(1, 1), Point(2, 2)],
    ...     crs="EPSG:4326",
    ... )
    >>> gdf.to_file(path, driver="GeoJSON")
    >>> chunks = list(
    ...     FeatureCollection.iter_features(path, chunksize=2)
    ... )
    >>> [len(c) for c in chunks]
    [2, 1]
    
  • Invalid chunksize raises ValueError:
    >>> from pyramids.feature import FeatureCollection
    >>> gen = FeatureCollection.iter_features("anywhere", chunksize=0)
    >>> next(gen)
    Traceback (most recent call last):
        ...
    ValueError: chunksize must be >= 1 when supplied; got 0.
    
Source code in src/pyramids/feature/collection.py
@classmethod
def iter_features(
    cls,
    path: str | Path,
    *,
    layer: str | int | None = None,
    bbox: tuple[float, float, float, float] | None = None,
    where: str | None = None,
    chunksize: int | None = None,
    tile_strategy: str = "auto",
    include_index: bool = False,
) -> Any:
    """Stream features from `path` without materializing the full file.

    . Two orthogonal knobs:

    * **Chunk shape**. `chunksize=None` yields one GeoJSON-style
      dict per row (fiona idiom). `chunksize=N` yields
      :class:`FeatureCollection` batches of up to N rows each so
      batched pipelines get a DataFrame-shaped payload.
    * **Tile strategy**. Controls whether the `bbox`
      filter is pushed into the format's spatial index (rtree on
      GPKG, row-group statistics on Parquet, …) or applied after
      a full scan. Pass one of:

      - `"auto"` (default) — let pyogrio pick. For a GPKG,
        pyogrio queries the `rtree_<layer>_geom` companion
        table automatically. For a Parquet file, pyogrio /
        pyarrow push the bbox down to the row-group statistics
        and skip non-matching groups. For formats without a
        spatial index (GeoJSON, Shapefile without a `.qix`)
        this falls back to a full scan in the driver.
      - `"rtree"` — same as `"auto"`; kept as an explicit
        name so pipeline code can document intent.
      - `"row_group"` — same as `"auto"`; explicit name for
        the Parquet case.
      - `"none"` — disable index pushdown; read whole chunks
        from the driver and apply the bbox filter in Python.
        Useful when the on-disk spatial index is stale or
        suspected wrong; also exercises the "slow path" in
        tests.

    `bbox` / `where` compose with any tile_strategy. Paths run
    through :func:`pyramids._io._parse_path` so cloud URLs and
    archive paths work the same way as in :meth:`read_file`.

    Args:
        path (str | Path): File path, URL, archive path.
        layer (str | int | None): Layer selector for multi-layer
            formats.
        bbox: `(minx, miny, maxx, maxy)` filter.
        where (str | None): OGR SQL predicate.
        chunksize (int | None): `None` yields dicts, an `int`
            yields `FeatureCollection` chunks.
        tile_strategy (str): One of `"auto"`, `"rtree"`,
            `"row_group"`, `"none"`. Default `"auto"`.
        include_index (bool): When `True`, each yielded dict gets
            an additional `"id"` key whose value is the
            0-based file-row index of that feature. The chunked
            form (`chunksize=N`) attaches the same index as a
            `"_row_index"` column on the yielded FC. The indices
            stay aligned with the on-disk rows even when a
            Python-side bbox filter (`tile_strategy="none"`)
            drops some rows — only the surviving features are
            yielded, and their ids match the positions they had
            in the source file. Defaults to `False` for
            back-compat with the fiona idiom.

    Yields:
        dict | FeatureCollection: Per-feature dicts when
        `chunksize` is `None`; FeatureCollection chunks
        otherwise.

    Raises:
        ValueError: If `chunksize` is given but `< 1`, or if
            `tile_strategy` is not one of the accepted values.

    Examples:
        - Stream features one at a time as GeoJSON-style dicts:
            ```python
            >>> import tempfile
            >>> from pathlib import Path
            >>> import geopandas as gpd
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> d = Path(tempfile.mkdtemp())
            >>> path = d / "pts.geojson"
            >>> gdf = gpd.GeoDataFrame(
            ...     {"id": [1, 2, 3]},
            ...     geometry=[Point(0, 0), Point(1, 1), Point(2, 2)],
            ...     crs="EPSG:4326",
            ... )
            >>> gdf.to_file(path, driver="GeoJSON")
            >>> feats = list(FeatureCollection.iter_features(path))
            >>> len(feats)
            3
            >>> feats[0]["properties"]["id"]
            1

            ```
        - Stream in `chunksize=2` batches as FeatureCollection chunks:
            ```python
            >>> import tempfile
            >>> from pathlib import Path
            >>> import geopandas as gpd
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> d = Path(tempfile.mkdtemp())
            >>> path = d / "pts.geojson"
            >>> gdf = gpd.GeoDataFrame(
            ...     {"id": [1, 2, 3]},
            ...     geometry=[Point(0, 0), Point(1, 1), Point(2, 2)],
            ...     crs="EPSG:4326",
            ... )
            >>> gdf.to_file(path, driver="GeoJSON")
            >>> chunks = list(
            ...     FeatureCollection.iter_features(path, chunksize=2)
            ... )
            >>> [len(c) for c in chunks]
            [2, 1]

            ```
        - Invalid `chunksize` raises `ValueError`:
            ```python
            >>> from pyramids.feature import FeatureCollection
            >>> gen = FeatureCollection.iter_features("anywhere", chunksize=0)
            >>> next(gen)
            Traceback (most recent call last):
                ...
            ValueError: chunksize must be >= 1 when supplied; got 0.

            ```
    """
    if chunksize is not None and chunksize < 1:
        raise ValueError(f"chunksize must be >= 1 when supplied; got {chunksize}.")
    if tile_strategy not in cls._VALID_TILE_STRATEGIES:
        raise ValueError(
            f"tile_strategy must be one of "
            f"{cls._VALID_TILE_STRATEGIES}; got {tile_strategy!r}."
        )

    import pyogrio

    resolved = str(_pyramids_io._parse_path(path))

    # Determine how many features are in the layer so we can
    # iterate in fixed-size batches via skip_features / max_features.
    # pyogrio's read_info is O(1) per call.
    info_kwargs: dict[str, Any] = {}
    if layer is not None:
        info_kwargs["layer"] = layer
    info = pyogrio.read_info(resolved, **info_kwargs)
    total = int(info["features"])

    if chunksize is None:
        batch_size = _DEFAULT_ITER_BATCH_SIZE
    else:
        batch_size = int(chunksize)

    # D-M3: pin the engine to pyogrio. `skip_features` /
    # `max_features` are pyogrio-specific (geopandas' fiona
    # engine silently ignores them, which would turn every chunk
    # into a full scan). Pinning the engine makes the contract
    # explicit and fails fast if pyogrio is absent.
    read_kwargs: dict[str, Any] = {"engine": "pyogrio"}
    if layer is not None:
        read_kwargs["layer"] = layer
    if where is not None:
        read_kwargs["where"] = where

    # when tile_strategy is "auto"/"rtree"/"row_group",
    # forward the bbox to pyogrio which transparently uses the
    # format's spatial index. When "none", hold the bbox back
    # and apply it in Python after each chunk loads.
    pushdown_bbox = bbox if tile_strategy != "none" else None
    python_bbox = bbox if tile_strategy == "none" else None
    if pushdown_bbox is not None:
        read_kwargs["bbox"] = pushdown_bbox

    for start in range(0, total, batch_size):
        gdf_chunk = gpd.read_file(
            resolved,
            skip_features=start,
            max_features=batch_size,
            **read_kwargs,
        )
        # remember the absolute row indices before any
        # bbox-based masking so callers can map yielded features
        # back to their source rows even after a Python-side filter
        # has dropped some of them.
        if include_index:
            row_indices = list(range(start, start + len(gdf_chunk)))
        if python_bbox is not None and len(gdf_chunk) > 0:
            xmin, ymin, xmax, ymax = python_bbox
            mask = gdf_chunk.intersects(box(xmin, ymin, xmax, ymax))
            if include_index:
                row_indices = [ri for ri, keep in zip(row_indices, mask) if keep]
            gdf_chunk = gdf_chunk[mask]
        if chunksize is None:
            iterator = gdf_chunk.iterfeatures(na="null")
            if include_index:
                for ri, feat in zip(row_indices, iterator):
                    feat["id"] = ri
                    yield feat
            else:
                for feat in iterator:
                    yield feat
        else:
            chunk_fc = cls(gdf_chunk)
            if include_index:
                chunk_fc["_row_index"] = row_indices
            yield chunk_fc

read_file(path, *, layer=None, bbox=None, mask=None, rows=None, columns=None, where=None, backend='pandas', npartitions=None, chunksize=None, **kwargs) classmethod #

Read a vector file into a FeatureCollection.

path is first routed through :func:pyramids._io._parse_path, which handles:

  • Cloud-URL rewriting (s3://, gs://, az://, abfs://, http(s)://, file:// → GDAL /vsi*/ form). verified end-to-end through an HTTP test. For AWS / GCS / Azure credentials either set the standard environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, GOOGLE_APPLICATION_CREDENTIALS, AZURE_STORAGE_CONNECTION_STRING, …) or scope them via :class:pyramids.base.remote.CloudConfig as a context manager around the read_file call.
  • Compressed-archive dispatch for .zip, .tar, .tar.gz, .gz on local paths — the returned path is a /vsizip/, /vsitar/ or /vsigzip/ string that :func:geopandas.read_file (via GDAL's virtual filesystem) can open directly. You can either pass just the archive path (first contained file wins) or archive.zip/inner.geojson to target a specific member. Cloud + archive chaining (http://host/x.zip) is not automatic today — if you need it, stage the archive locally first or use CloudConfig with an explicit /vsizip//vsicurl/... path.

filter kwargs are pushed down to fiona/pyogrio so the dataset never fully materializes when only a subset is needed.

Parameters:

Name Type Description Default
path str | Path

File path, URL, archive path, or archive.ext/inner-file form.

required
layer str | int | None

Layer name or index for multi-layer formats (GeoPackage, GDB, KML, …). None reads the first / default layer.

None
bbox tuple[float, float, float, float] | Any

(minx, miny, maxx, maxy) tuple, or a GeoDataFrame / GeoSeries / shapely geometry whose total bounds are used. Only features intersecting the bbox are loaded.

None
mask Any

A shapely geometry (or mapping / GeoSeries / GeoDataFrame) whose geometries are used as a mask — only features intersecting the mask are loaded. Finer than bbox (actual geometry intersection, not just envelope). Mutually exclusive with bbox.

None
rows slice | int | None

int — read at most N rows. slice — read the given range of rows. Useful for sampling.

None
columns list[str] | None

Restrict loaded attribute columns. Geometry is always loaded. None loads every column.

None
where str | None

OGR SQL WHERE-clause predicate pushed down to the driver (e.g. "population > 10000"). Avoids loading non-matching features.

None
**kwargs Any

Forwarded to :func:geopandas.read_file verbatim for engine-specific options (engine="pyogrio", use_arrow=True, driver-specific creation options).

{}

Returns:

Name Type Description
FeatureCollection FeatureCollection | LazyFeatureCollection

The (possibly filtered) features

FeatureCollection | LazyFeatureCollection

wrapped as a FeatureCollection.

Examples:

  • Load a GeoJSON file:
    >>> from pyramids.feature import FeatureCollection
    >>> fc = FeatureCollection.read_file("tests/data/coello-gauges.geojson")
    >>> len(fc) > 0
    True
    
Source code in src/pyramids/feature/collection.py
@classmethod
def read_file(
    cls,
    path: str | Path,
    *,
    layer: str | int | None = None,
    bbox: tuple[float, float, float, float] | Any = None,
    mask: Any = None,
    rows: slice | int | None = None,
    columns: list[str] | None = None,
    where: str | None = None,
    backend: str = "pandas",
    npartitions: int | None = None,
    chunksize: int | None = None,
    **kwargs: Any,
) -> FeatureCollection | LazyFeatureCollection:
    """Read a vector file into a FeatureCollection.

    path is first routed through
    :func:`pyramids._io._parse_path`, which handles:

    * Cloud-URL rewriting (`s3://`, `gs://`, `az://`,
      `abfs://`, `http(s)://`, `file://` → GDAL `/vsi*/`
      form). verified end-to-end through an HTTP test.
      For AWS / GCS / Azure credentials either set the standard
      environment variables (`AWS_ACCESS_KEY_ID`,
      `AWS_SECRET_ACCESS_KEY`, `GOOGLE_APPLICATION_CREDENTIALS`,
      `AZURE_STORAGE_CONNECTION_STRING`, …) or scope them via
      :class:`pyramids.base.remote.CloudConfig` as a context
      manager around the `read_file` call.
    * Compressed-archive dispatch for `.zip`, `.tar`, `.tar.gz`,
      `.gz` on **local** paths — the returned path is a
      `/vsizip/`, `/vsitar/` or `/vsigzip/` string that
      :func:`geopandas.read_file` (via GDAL's virtual filesystem)
      can open directly. You can either pass just the archive
      path (first contained file wins) or
      `archive.zip/inner.geojson` to target a specific member.
      Cloud + archive chaining (`http://host/x.zip`) is not
      automatic today — if you need it, stage the archive
      locally first or use `CloudConfig` with an explicit
      `/vsizip//vsicurl/...` path.

    filter kwargs are pushed down to fiona/pyogrio so the
    dataset never fully materializes when only a subset is needed.

    Args:
        path (str | Path):
            File path, URL, archive path, or
            `archive.ext/inner-file` form.
        layer (str | int | None):
            Layer name or index for multi-layer formats
            (GeoPackage, GDB, KML, …). `None` reads the first /
            default layer.
        bbox:
            `(minx, miny, maxx, maxy)` tuple, or a
            `GeoDataFrame` / `GeoSeries` / shapely geometry
            whose total bounds are used. Only features
            intersecting the bbox are loaded.
        mask:
            A shapely geometry (or mapping / GeoSeries /
            GeoDataFrame) whose geometries are used as a mask —
            only features intersecting the mask are loaded. Finer
            than `bbox` (actual geometry intersection, not just
            envelope). Mutually exclusive with `bbox`.
        rows (slice | int | None):
            `int` — read at most N rows. `slice` — read the
            given range of rows. Useful for sampling.
        columns (list[str] | None):
            Restrict loaded attribute columns. Geometry is
            always loaded. `None` loads every column.
        where (str | None):
            OGR SQL `WHERE`-clause predicate pushed down to the
            driver (e.g. `"population > 10000"`). Avoids loading
            non-matching features.
        **kwargs:
            Forwarded to :func:`geopandas.read_file` verbatim for
            engine-specific options (`engine="pyogrio"`,
            `use_arrow=True`, driver-specific creation options).

    Returns:
        FeatureCollection: The (possibly filtered) features
        wrapped as a FeatureCollection.

    Examples:
        - Load a GeoJSON file:
            ```python
            >>> from pyramids.feature import FeatureCollection
            >>> fc = FeatureCollection.read_file("tests/data/coello-gauges.geojson")
            >>> len(fc) > 0
            True

            ```
    """
    resolved = _pyramids_io._parse_path(path)
    if backend == "dask":
        # dask_geopandas.read_file does NOT forward pyogrio
        # filter kwargs (bbox / mask / rows / columns / where) —
        # silently dropping them was the bug. Raise a clear
        # ValueError instead so users know to either pre-filter
        # or call .compute() and filter eagerly.
        unsupported = {
            "bbox": bbox,
            "mask": mask,
            "rows": rows,
            "columns": columns,
            "where": where,
            "layer": layer,
        }
        supplied = [k for k, v in unsupported.items() if v is not None]
        if supplied:
            raise ValueError(
                f"backend='dask' does not support filter kwargs "
                f"{supplied}. dask_geopandas.read_file has no "
                "pushdown story for these. Either omit them and "
                "filter post-load via .clip / .loc / .compute, or "
                "switch to read_parquet(backend='dask', filters=...)"
            )
        try:
            import dask_geopandas
        except ImportError as exc:
            raise ImportError(
                "backend='dask' requires the optional "
                "'dask-geopandas' dependency. Install with one of:\n"
                "  - PyPI:        pip install 'pyramids-gis[parquet-lazy]'\n"
                "  - conda-forge: conda install -c conda-forge pyramids-parquet-lazy"
            ) from exc
        # default npartitions from file size when neither
        # kwarg was supplied; one-partition fallback defeats the
        # point of going lazy.
        partition_kwargs = _resolve_lazy_partitioning(
            resolved,
            npartitions,
            chunksize,
        )
        # wrap the lazy return as a LazyFeatureCollection so the
        # dask branch stays inside the pyramids type system.
        from pyramids.feature._lazy_collection import LazyFeatureCollection

        dask_gdf = dask_geopandas.read_file(resolved, **partition_kwargs)
        return LazyFeatureCollection.from_dask_gdf(dask_gdf)
    if backend != "pandas":
        raise ValueError(f"backend must be 'pandas' or 'dask', got {backend!r}")
    # Only pass kwargs that were actually supplied — passing the
    # defaults (None) is fine for some geopandas engines but
    # confuses others. Build a clean kwargs dict.
    passthrough: dict[str, Any] = {}
    if layer is not None:
        passthrough["layer"] = layer
    if bbox is not None:
        passthrough["bbox"] = bbox
    if mask is not None:
        passthrough["mask"] = mask
    if rows is not None:
        passthrough["rows"] = rows
    if columns is not None:
        passthrough["columns"] = columns
    if where is not None:
        passthrough["where"] = where
    passthrough.update(kwargs)
    gdf = gpd.read_file(resolved, **passthrough)
    return cls(gdf)

__str__() #

Return a short, pyramids-branded summary of the collection.

Source code in src/pyramids/feature/collection.py
def __str__(self) -> str:
    """Return a short, pyramids-branded summary of the collection."""
    n = len(self)
    cols = self.columns.tolist()
    epsg = self.epsg
    return f"FeatureCollection({n} features, " f"columns={cols}, epsg={epsg})"

__repr__() #

Return a pyramids-branded repr.

Source code in src/pyramids/feature/collection.py
def __repr__(self) -> str:
    """Return a pyramids-branded repr."""
    return (
        f"FeatureCollection(n_features={len(self)}, "
        f"columns={self.columns.tolist()}, epsg={self.epsg})"
    )

list_layers(path) classmethod #

List every vector-layer name in path.

Routes through :func:pyramids._io._parse_path so the same cloud-URL / archive rewriting that :meth:read_file uses applies here too. Uses :func:pyogrio.list_layers under the hood (geopandas' default engine).

results are memoised behind a 128-entry LRU cache keyed on the resolved str path. Re-calling list_layers on the same cloud URL or local path in a loop now costs one hash lookup instead of one datasource open. Call :meth:list_layers_cache_clear to invalidate after an out-of-band write.

Parameters:

Name Type Description Default
path str | Path

File path, URL, or archive path. Single-layer formats like GeoJSON return one name; multi-layer formats (GPKG, GDB, KML) return every layer.

required

Returns:

Type Description
list[str]

list[str]: Layer names in the order the driver reports them.

Raises:

Type Description
FileNotFoundError

If path is a local filesystem path that does not exist. Cloud URLs and /vsi* paths skip this check and defer to the underlying driver . Previously all failures surfaced as an opaque VectorDriverError("Failed to open datasource").

Examples:

  • A single-layer GeoJSON returns one name derived from the filename:
    >>> import tempfile
    >>> from pathlib import Path
    >>> import geopandas as gpd
    >>> from shapely.geometry import Point
    >>> from pyramids.feature import FeatureCollection
    >>> d = Path(tempfile.mkdtemp())
    >>> path = d / "pts.geojson"
    >>> gdf = gpd.GeoDataFrame(
    ...     {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
    ... )
    >>> gdf.to_file(path, driver="GeoJSON")
    >>> FeatureCollection.list_layers(path)
    ['pts']
    
  • A missing local path raises FileNotFoundError:
    >>> from pyramids.feature import FeatureCollection
    >>> FeatureCollection.list_layers("does/not/exist.geojson")
    Traceback (most recent call last):
        ...
    FileNotFoundError: list_layers: no file at 'does/not/exist.geojson'.
    
Source code in src/pyramids/feature/collection.py
@classmethod
def list_layers(cls, path: str | Path) -> list[str]:
    """List every vector-layer name in `path`.

    Routes through :func:`pyramids._io._parse_path` so the same
    cloud-URL / archive rewriting that :meth:`read_file` uses
    applies here too. Uses :func:`pyogrio.list_layers` under the
    hood (geopandas' default engine).

    results are memoised behind a 128-entry LRU cache keyed on
    the resolved `str` path. Re-calling `list_layers` on the
    same cloud URL or local path in a loop now costs one hash
    lookup instead of one datasource open. Call
    :meth:`list_layers_cache_clear` to invalidate after an
    out-of-band write.

    Args:
        path (str | Path):
            File path, URL, or archive path. Single-layer formats
            like GeoJSON return one name; multi-layer formats
            (GPKG, GDB, KML) return every layer.

    Returns:
        list[str]: Layer names in the order the driver reports them.

    Raises:
        FileNotFoundError: If `path` is a local filesystem path
            that does not exist. Cloud URLs and `/vsi*` paths
            skip this check and defer to the underlying driver
            . Previously all failures surfaced as an opaque
            `VectorDriverError("Failed to open datasource")`.

    Examples:
        - A single-layer GeoJSON returns one name derived from the filename:
            ```python
            >>> import tempfile
            >>> from pathlib import Path
            >>> import geopandas as gpd
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> d = Path(tempfile.mkdtemp())
            >>> path = d / "pts.geojson"
            >>> gdf = gpd.GeoDataFrame(
            ...     {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
            ... )
            >>> gdf.to_file(path, driver="GeoJSON")
            >>> FeatureCollection.list_layers(path)
            ['pts']

            ```
        - A missing local path raises `FileNotFoundError`:
            ```python
            >>> from pyramids.feature import FeatureCollection
            >>> FeatureCollection.list_layers("does/not/exist.geojson")
            Traceback (most recent call last):
                ...
            FileNotFoundError: list_layers: no file at 'does/not/exist.geojson'.

            ```
    """
    # pre-check local-path existence so the caller sees
    # a `FileNotFoundError` naming the path instead of a generic
    # driver-open failure. Defer to `base.remote.is_remote` as
    # the single source of truth for which schemes are remote —
    # the previous hardcoded prefix tuple would silently treat any
    # future scheme as local and raise a misleading error.
    path_str = str(path)
    if not is_remote(path_str):
        local = Path(path_str)
        if not local.exists():
            raise FileNotFoundError(f"list_layers: no file at {path_str!r}.")

    resolved = str(_pyramids_io._parse_path(path))
    return list(_list_layers_cached(resolved))

list_layers_cache_clear() classmethod #

Clear the C15 LRU cache backing :meth:list_layers.

Call this after writing a new layer to an existing multi-layer file (e.g. a GPKG) if you then want :meth:list_layers to see the new layer. Otherwise the 128-entry LRU cache is self- managing and callers do not need to touch it.

Returns:

Name Type Description
None None

This method does not return a value.

Examples:

  • Clearing an empty cache is a safe no-op:
    >>> from pyramids.feature import FeatureCollection
    >>> FeatureCollection.list_layers_cache_clear()
    >>> FeatureCollection.list_layers_cache_clear()
    
  • After an out-of-band write, clear the cache so the next list_layers call re-reads the updated file:
    >>> import tempfile
    >>> from pathlib import Path
    >>> import geopandas as gpd
    >>> from shapely.geometry import Point
    >>> from pyramids.feature import FeatureCollection
    >>> d = Path(tempfile.mkdtemp())
    >>> path = d / "pts.geojson"
    >>> gpd.GeoDataFrame(
    ...     {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
    ... ).to_file(path, driver="GeoJSON")
    >>> _ = FeatureCollection.list_layers(path)
    >>> FeatureCollection.list_layers_cache_clear()
    >>> FeatureCollection.list_layers(path)
    ['pts']
    
Source code in src/pyramids/feature/collection.py
@classmethod
def list_layers_cache_clear(cls) -> None:
    """Clear the C15 LRU cache backing :meth:`list_layers`.

    Call this after writing a new layer to an existing multi-layer
    file (e.g. a GPKG) if you then want :meth:`list_layers` to see
    the new layer. Otherwise the 128-entry LRU cache is self-
    managing and callers do not need to touch it.

    Returns:
        None: This method does not return a value.

    Examples:
        - Clearing an empty cache is a safe no-op:
            ```python
            >>> from pyramids.feature import FeatureCollection
            >>> FeatureCollection.list_layers_cache_clear()
            >>> FeatureCollection.list_layers_cache_clear()

            ```
        - After an out-of-band write, clear the cache so the next
          `list_layers` call re-reads the updated file:
            ```python
            >>> import tempfile
            >>> from pathlib import Path
            >>> import geopandas as gpd
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> d = Path(tempfile.mkdtemp())
            >>> path = d / "pts.geojson"
            >>> gpd.GeoDataFrame(
            ...     {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
            ... ).to_file(path, driver="GeoJSON")
            >>> _ = FeatureCollection.list_layers(path)
            >>> FeatureCollection.list_layers_cache_clear()
            >>> FeatureCollection.list_layers(path)
            ['pts']

            ```
    """
    _list_layers_cached.cache_clear()

open_arrow(path, *, layer=None, columns=None, bbox=None, where=None, batch_size=None) classmethod #

Open a vector file as a streaming :class:pyarrow.RecordBatchReader.

Thin wrapper over :func:pyogrio.raw.open_arrow that surfaces the underlying Arrow RecordBatch iterator. Rows are yielded in batches, so callers can iterate through multi-GB datasets without materializing the whole table in memory — useful for building custom dask partitioners.

Parameters:

Name Type Description Default
path str | Path

Vector file path (Shapefile, GPKG, FlatGeobuf, GeoJSON, GeoParquet,...). Routed through :func:pyramids._io._parse_path so cloud URLs work.

required
layer str | int | None

Layer name or index for multi-layer formats.

None
columns list[str] | None

Attribute columns to load (geometry is always included).

None
bbox tuple[float, float, float, float] | None

(minx, miny, maxx, maxy) filter.

None
where str | None

OGR SQL WHERE predicate pushed down to the driver.

None
batch_size int | None

Requested RecordBatch size in rows. None uses the driver default.

None

Returns:

Type Description
Any

pyarrow.RecordBatchReader: A streaming reader. Call

Any

.read_all() to materialise, or iterate for row-batch

Any

consumption.

Raises:

Type Description
ImportError

If :mod:pyogrio is not installed.

Source code in src/pyramids/feature/collection.py
@classmethod
def open_arrow(
    cls,
    path: str | Path,
    *,
    layer: str | int | None = None,
    columns: list[str] | None = None,
    bbox: tuple[float, float, float, float] | None = None,
    where: str | None = None,
    batch_size: int | None = None,
) -> Any:
    """Open a vector file as a streaming :class:`pyarrow.RecordBatchReader`.

    Thin wrapper over :func:`pyogrio.raw.open_arrow` that surfaces
    the underlying Arrow RecordBatch iterator. Rows are yielded in
    batches, so callers can iterate through multi-GB datasets
    without materializing the whole table in memory — useful for
    building custom dask partitioners.

    Args:
        path: Vector file path (Shapefile, GPKG, FlatGeobuf,
            GeoJSON, GeoParquet,...). Routed through
            :func:`pyramids._io._parse_path` so cloud URLs work.
        layer: Layer name or index for multi-layer formats.
        columns: Attribute columns to load (`geometry` is
            always included).
        bbox: `(minx, miny, maxx, maxy)` filter.
        where: OGR SQL `WHERE` predicate pushed down to the
            driver.
        batch_size: Requested RecordBatch size in rows. `None`
            uses the driver default.

    Returns:
        pyarrow.RecordBatchReader: A streaming reader. Call
        `.read_all()` to materialise, or iterate for row-batch
        consumption.

    Raises:
        ImportError: If :mod:`pyogrio` is not installed.
    """
    try:
        from pyogrio.raw import open_arrow
    except ImportError as exc:
        raise ImportError(
            "open_arrow requires the optional 'pyogrio' dependency. "
            "Install with one of:\n"
            "  - PyPI:        pip install pyogrio\n"
            "  - conda-forge: conda install -c conda-forge pyogrio"
        ) from exc
    resolved = _pyramids_io._parse_path(path)
    kwargs: dict[str, Any] = {}
    if layer is not None:
        kwargs["layer"] = layer
    if columns is not None:
        kwargs["columns"] = columns
    if bbox is not None:
        kwargs["bbox"] = bbox
    if where is not None:
        kwargs["where"] = where
    if batch_size is not None:
        kwargs["batch_size"] = batch_size
    return open_arrow(resolved, **kwargs)

read_parquet(path, *, columns=None, bbox=None, backend='pandas', split_row_groups=None, filters=None, blocksize=None, storage_options=None, **kwargs) classmethod #

Read a GeoParquet file into a FeatureCollection.

GeoParquet is a cloud-native columnar vector format (OGC- adopted December 2024) — faster to scan than GeoJSON, smaller than Shapefile, and partitioned in a way that suits distributed compute. This method is a thin wrapper around :func:geopandas.read_parquet; the path is first routed through :func:pyramids._io._parse_path so cloud URLs (s3://, gs://, http(s)://, …) resolve the same way they do in :meth:read_file.

Requires the optional :mod:pyarrow dependency. Install with one of:

  • PyPI: pip install 'pyramids-gis[parquet]'
  • conda-forge: conda install -c conda-forge pyramids-parquet

Parameters:

Name Type Description Default
path str | Path

Local path, cloud URL, or any form :func:pyramids._io._parse_path accepts.

required
columns list[str] | None

Project a subset of columns — Parquet's columnar layout makes this a true I/O win, unlike row-oriented formats. geometry is always loaded. None loads every column.

None
bbox tuple[float, float, float, float] | None

(minx, miny, maxx, maxy) spatial filter. Forwarded to :func:geopandas.read_parquet which uses the file's GeoParquet spatial-index metadata when present to skip non-matching row groups — a true I/O win on large files. None (default) loads every feature.

None
**kwargs Any

Forwarded to :func:geopandas.read_parquet (storage_options= for fsspec, etc.).

{}

Returns:

Name Type Description
FeatureCollection FeatureCollection | LazyFeatureCollection

The file's features wrapped as a

FeatureCollection | LazyFeatureCollection

FeatureCollection.

Raises:

Type Description
ImportError

If :mod:pyarrow is not installed, with a pyramids-branded message pointing at the [parquet] optional-dependency extra (D-M5).

Examples:

  • Round-trip a small FC through GeoParquet (requires pyarrow):
    >>> import tempfile  # doctest: +SKIP
    >>> from pathlib import Path  # doctest: +SKIP
    >>> import geopandas as gpd  # doctest: +SKIP
    >>> from shapely.geometry import Point  # doctest: +SKIP
    >>> from pyramids.feature import FeatureCollection  # doctest: +SKIP
    >>> d = Path(tempfile.mkdtemp())  # doctest: +SKIP
    >>> path = d / "pts.parquet"  # doctest: +SKIP
    >>> gpd.GeoDataFrame(
    ...     {"id": [1, 2]},
    ...     geometry=[Point(0, 0), Point(1, 1)],
    ...     crs="EPSG:4326",
    ... ).to_parquet(path)  # doctest: +SKIP
    >>> fc = FeatureCollection.read_parquet(path)  # doctest: +SKIP
    >>> len(fc)  # doctest: +SKIP
    2
    >>> fc.epsg  # doctest: +SKIP
    4326
    
  • Project a subset of columns to speed up I/O on wide files:
    >>> fc = FeatureCollection.read_parquet(  # doctest: +SKIP
    ...     "s3://bucket/big.parquet",
    ...     columns=["id", "geometry"],
    ... )
    >>> fc.column  # doctest: +SKIP
    ['id', 'geometry']
    
  • A missing pyarrow dependency raises a branded ImportError:
    >>> FeatureCollection.read_parquet("x.parquet")  # doctest: +SKIP
    Traceback (most recent call last):
        ...
    ImportError: GeoParquet support requires the optional 'pyarrow'...
    
Source code in src/pyramids/feature/collection.py
@classmethod
def read_parquet(
    cls,
    path: str | Path,
    *,
    columns: list[str] | None = None,
    bbox: tuple[float, float, float, float] | None = None,
    backend: str = "pandas",
    split_row_groups: bool | None = None,
    filters: list | None = None,
    blocksize: int | str | None = None,
    storage_options: dict | None = None,
    **kwargs: Any,
) -> FeatureCollection | LazyFeatureCollection:
    """Read a GeoParquet file into a FeatureCollection.

    GeoParquet is a cloud-native columnar vector format (OGC-
    adopted December 2024) — faster to scan than GeoJSON, smaller
    than Shapefile, and partitioned in a way that suits distributed
    compute. This method is a thin wrapper around
    :func:`geopandas.read_parquet`; the path is first routed
    through :func:`pyramids._io._parse_path` so cloud URLs
    (`s3://`, `gs://`, `http(s)://`, …) resolve the same way
    they do in :meth:`read_file`.

    Requires the optional :mod:`pyarrow` dependency. Install with one of:

    - PyPI: ``pip install 'pyramids-gis[parquet]'``
    - conda-forge: ``conda install -c conda-forge pyramids-parquet``

    Args:
        path (str | Path):
            Local path, cloud URL, or any form
            :func:`pyramids._io._parse_path` accepts.
        columns (list[str] | None):
            Project a subset of columns — Parquet's columnar
            layout makes this a true I/O win, unlike row-oriented
            formats. `geometry` is always loaded. `None`
            loads every column.
        bbox (tuple[float, float, float, float] | None):
            `(minx, miny, maxx, maxy)` spatial filter.
            Forwarded to :func:`geopandas.read_parquet` which uses
            the file's GeoParquet spatial-index metadata when
            present to skip non-matching row groups — a true I/O
            win on large files. `None` (default) loads every
            feature.
        **kwargs:
            Forwarded to :func:`geopandas.read_parquet`
            (`storage_options=` for fsspec, etc.).

    Returns:
        FeatureCollection: The file's features wrapped as a
        FeatureCollection.

    Raises:
        ImportError: If :mod:`pyarrow` is not installed, with a
            pyramids-branded message pointing at the
            `[parquet]` optional-dependency extra (D-M5).

    Examples:
        - Round-trip a small FC through GeoParquet (requires pyarrow):
            ```python
            >>> import tempfile  # doctest: +SKIP
            >>> from pathlib import Path  # doctest: +SKIP
            >>> import geopandas as gpd  # doctest: +SKIP
            >>> from shapely.geometry import Point  # doctest: +SKIP
            >>> from pyramids.feature import FeatureCollection  # doctest: +SKIP
            >>> d = Path(tempfile.mkdtemp())  # doctest: +SKIP
            >>> path = d / "pts.parquet"  # doctest: +SKIP
            >>> gpd.GeoDataFrame(
            ...     {"id": [1, 2]},
            ...     geometry=[Point(0, 0), Point(1, 1)],
            ...     crs="EPSG:4326",
            ... ).to_parquet(path)  # doctest: +SKIP
            >>> fc = FeatureCollection.read_parquet(path)  # doctest: +SKIP
            >>> len(fc)  # doctest: +SKIP
            2
            >>> fc.epsg  # doctest: +SKIP
            4326

            ```
        - Project a subset of columns to speed up I/O on wide files:
            ```python
            >>> fc = FeatureCollection.read_parquet(  # doctest: +SKIP
            ...     "s3://bucket/big.parquet",
            ...     columns=["id", "geometry"],
            ... )
            >>> fc.column  # doctest: +SKIP
            ['id', 'geometry']

            ```
        - A missing pyarrow dependency raises a branded `ImportError`:
            ```python
            >>> FeatureCollection.read_parquet("x.parquet")  # doctest: +SKIP
            Traceback (most recent call last):
                ...
            ImportError: GeoParquet support requires the optional 'pyarrow'...

            ```
    """
    resolved = _pyramids_io._parse_path(path)
    if backend == "dask":
        # check deps in order of specificity — the backend
        # request is the more specific signal, so the
        # dask-geopandas hint beats the generic pyarrow one.
        # When both are missing, the dask-geopandas error names
        # the extra that installs both ([parquet-lazy]).
        try:
            import dask_geopandas
        except ImportError as exc:
            raise ImportError(
                "backend='dask' requires the optional "
                "'dask-geopandas' dependency. Install with one of:\n"
                "  - PyPI:        pip install 'pyramids-gis[parquet-lazy]'\n"
                "  - conda-forge: conda install -c conda-forge pyramids-parquet-lazy"
            ) from exc
        dask_kwargs: dict[str, Any] = {}
        if columns is not None:
            dask_kwargs["columns"] = columns
        if split_row_groups is not None:
            dask_kwargs["split_row_groups"] = split_row_groups
        if filters is not None:
            dask_kwargs["filters"] = filters
        if blocksize is not None:
            dask_kwargs["blocksize"] = blocksize
        if storage_options is not None:
            dask_kwargs["storage_options"] = storage_options
        dask_kwargs.update(kwargs)
        # dask_geopandas is installed → assert pyarrow too, so
        # the user gets the pyramids-branded hint (not the
        # upstream message dask_geopandas would emit when it tries
        # to read). `[parquet-lazy]` pulls both.
        _require_pyarrow()
        # wrap the lazy return as a LazyFeatureCollection so the
        # dask branch stays inside the pyramids type system.
        from pyramids.feature._lazy_collection import LazyFeatureCollection

        dask_gdf = dask_geopandas.read_parquet(resolved, **dask_kwargs)
        return LazyFeatureCollection.from_dask_gdf(dask_gdf)
    if backend != "pandas":
        raise ValueError(f"backend must be 'pandas' or 'dask', got {backend!r}")
    _require_pyarrow()
    # geopandas 1.x forwards **kwargs straight into
    # `pyarrow.parquet.read_table`, which has never accepted the
    # pandas-style `engine=` kwarg. `_require_pyarrow()` above
    # already hard-guarantees the pyarrow backend, so no injection
    # is needed here. If geopandas ever reintroduces a fastparquet
    # path it will be opt-in via a new kwarg, not a silent switch.
    passthrough: dict[str, Any] = {}
    passthrough.update(kwargs)
    if columns is not None:
        passthrough["columns"] = columns
    if bbox is not None:
        passthrough["bbox"] = bbox
    if storage_options is not None:
        passthrough["storage_options"] = storage_options
    gdf = gpd.read_parquet(resolved, **passthrough)
    return cls(gdf)

to_parquet(path, *, compression='snappy', index=None, **kwargs) #

Write this FeatureCollection to GeoParquet.

Thin wrapper around :meth:geopandas.GeoDataFrame.to_parquet that defaults :param:compression to "snappy" — the format-standard tradeoff between speed and size.

Requires the optional :mod:pyarrow dependency. Install with one of:

  • PyPI: pip install 'pyramids-gis[parquet]'
  • conda-forge: conda install -c conda-forge pyramids-parquet

Parameters:

Name Type Description Default
path str | Path

Destination file path.

required
compression str

Parquet compression codec — "snappy" (default), "gzip", "brotli", "lz4", "zstd", or "none". "snappy" is the GeoParquet-spec recommended default.

'snappy'
index bool | None

Whether to include the pandas index as a column. None (default) uses geopandas' default behavior: preserve a non-default index, drop the default RangeIndex.

None
**kwargs Any

Forwarded to :meth:geopandas.GeoDataFrame.to_parquet.

{}

Raises:

Type Description
ImportError

If :mod:pyarrow is not installed, with a pyramids-branded message pointing at the [parquet] optional-dependency extra (D-M5).

Examples:

  • Write a FeatureCollection with the default snappy codec:
    >>> import tempfile  # doctest: +SKIP
    >>> from pathlib import Path  # doctest: +SKIP
    >>> import geopandas as gpd  # doctest: +SKIP
    >>> from shapely.geometry import Point  # doctest: +SKIP
    >>> from pyramids.feature import FeatureCollection  # doctest: +SKIP
    >>> d = Path(tempfile.mkdtemp())  # doctest: +SKIP
    >>> fc = FeatureCollection(
    ...     gpd.GeoDataFrame(
    ...         {"id": [1, 2]},
    ...         geometry=[Point(0, 0), Point(1, 1)],
    ...         crs="EPSG:4326",
    ...     )
    ... )  # doctest: +SKIP
    >>> path = d / "out.parquet"  # doctest: +SKIP
    >>> fc.to_parquet(path)  # doctest: +SKIP
    >>> path.exists()  # doctest: +SKIP
    True
    
  • Pick a different codec (e.g. zstd for better compression):
    >>> import tempfile  # doctest: +SKIP
    >>> from pathlib import Path  # doctest: +SKIP
    >>> import geopandas as gpd  # doctest: +SKIP
    >>> from shapely.geometry import Point  # doctest: +SKIP
    >>> from pyramids.feature import FeatureCollection  # doctest: +SKIP
    >>> d = Path(tempfile.mkdtemp())  # doctest: +SKIP
    >>> fc = FeatureCollection(
    ...     gpd.GeoDataFrame(
    ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
    ...     )
    ... )  # doctest: +SKIP
    >>> fc.to_parquet(d / "out.parquet", compression="zstd")  # doctest: +SKIP
    
Source code in src/pyramids/feature/collection.py
def to_parquet(
    self,
    path: str | Path,
    *,
    compression: str = "snappy",
    index: bool | None = None,
    **kwargs: Any,
) -> None:
    """Write this FeatureCollection to GeoParquet.

    Thin wrapper around :meth:`geopandas.GeoDataFrame.to_parquet`
    that defaults :param:`compression` to `"snappy"` — the
    format-standard tradeoff between speed and size.

    Requires the optional :mod:`pyarrow` dependency. Install with one of:

    - PyPI: ``pip install 'pyramids-gis[parquet]'``
    - conda-forge: ``conda install -c conda-forge pyramids-parquet``

    Args:
        path (str | Path):
            Destination file path.
        compression (str):
            Parquet compression codec — `"snappy"` (default),
            `"gzip"`, `"brotli"`, `"lz4"`, `"zstd"`, or
            `"none"`. `"snappy"` is the GeoParquet-spec
            recommended default.
        index (bool | None):
            Whether to include the pandas index as a column.
            `None` (default) uses geopandas' default behavior:
            preserve a non-default index, drop the default
            `RangeIndex`.
        **kwargs:
            Forwarded to :meth:`geopandas.GeoDataFrame.to_parquet`.

    Raises:
        ImportError: If :mod:`pyarrow` is not installed, with a
            pyramids-branded message pointing at the
            `[parquet]` optional-dependency extra (D-M5).

    Examples:
        - Write a FeatureCollection with the default snappy codec:
            ```python
            >>> import tempfile  # doctest: +SKIP
            >>> from pathlib import Path  # doctest: +SKIP
            >>> import geopandas as gpd  # doctest: +SKIP
            >>> from shapely.geometry import Point  # doctest: +SKIP
            >>> from pyramids.feature import FeatureCollection  # doctest: +SKIP
            >>> d = Path(tempfile.mkdtemp())  # doctest: +SKIP
            >>> fc = FeatureCollection(
            ...     gpd.GeoDataFrame(
            ...         {"id": [1, 2]},
            ...         geometry=[Point(0, 0), Point(1, 1)],
            ...         crs="EPSG:4326",
            ...     )
            ... )  # doctest: +SKIP
            >>> path = d / "out.parquet"  # doctest: +SKIP
            >>> fc.to_parquet(path)  # doctest: +SKIP
            >>> path.exists()  # doctest: +SKIP
            True

            ```
        - Pick a different codec (e.g. zstd for better compression):
            ```python
            >>> import tempfile  # doctest: +SKIP
            >>> from pathlib import Path  # doctest: +SKIP
            >>> import geopandas as gpd  # doctest: +SKIP
            >>> from shapely.geometry import Point  # doctest: +SKIP
            >>> from pyramids.feature import FeatureCollection  # doctest: +SKIP
            >>> d = Path(tempfile.mkdtemp())  # doctest: +SKIP
            >>> fc = FeatureCollection(
            ...     gpd.GeoDataFrame(
            ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
            ...     )
            ... )  # doctest: +SKIP
            >>> fc.to_parquet(d / "out.parquet", compression="zstd")  # doctest: +SKIP

            ```
    """
    _require_pyarrow()
    super().to_parquet(path, compression=compression, index=index, **kwargs)

to_file(path, driver='geojson', *, layer=None, mode='w', **creation_options) #

Write this FeatureCollection to a vector file.

layer, mode, and arbitrary driver creation options are now first-class kwargs. Previously callers had to rely on implicit **kwargs forwarding, which hurt discoverability.

Parameters:

Name Type Description Default
path str | Path

Destination file path.

required
driver str

Driver alias (e.g. "geojson", "gpkg") or literal GDAL driver name ("GeoJSON", "GPKG", "ESRI Shapefile"). Resolved via :class:Catalog.

'geojson'
layer str | None

Layer name for multi-layer drivers (GPKG, GDB, …). Writing two layers into the same GPKG is the canonical use case. None defers to the driver default.

None
mode str

"w" (default) overwrites; "a" appends to an existing layer. Append support depends on the driver — GPKG and Shapefile accept it, GeoJSON does not.

'w'
**creation_options Any

Driver-specific creation options, forwarded to the underlying engine (pyogrio / fiona). Examples:

  • GPKG: SPATIAL_INDEX="YES", FID="id".
  • Shapefile: ENCODING="UTF-8".
  • GeoJSON: COORDINATE_PRECISION=6, RFC7946=YES.

Keys are case-preserving and passed verbatim to the driver; consult the GDAL driver docs for the full list.

pyogrio (the default geopandas engine on 1.0+) raises :class:ValueError with the message "unrecognized option '<name>' for driver '<driver>'" when a supplied option is neither in the driver's dataset nor its layer creation-option list. This surfaces typos (SPATIAL_INDX vs SPATIAL_INDEX) at write-time rather than silently producing a different file. Some drivers may still accept options that pyogrio does not list — verify against the driver's docs when in doubt.

{}

Raises:

Type Description
ValueError

If mode isn't "w" or "a", or if a supplied creation option is not recognised by the driver (raised by pyogrio — see the **creation_options note above).

Examples:

  • Round-trip a small FC through GeoJSON (the default driver):
    >>> import tempfile
    >>> from pathlib import Path
    >>> import geopandas as gpd
    >>> from shapely.geometry import Point
    >>> from pyramids.feature import FeatureCollection
    >>> d = Path(tempfile.mkdtemp())
    >>> fc = FeatureCollection(
    ...     gpd.GeoDataFrame(
    ...         {"id": [1, 2]},
    ...         geometry=[Point(0, 0), Point(1, 1)],
    ...         crs="EPSG:4326",
    ...     )
    ... )
    >>> path = d / "out.geojson"
    >>> fc.to_file(path)
    >>> path.exists()
    True
    >>> FeatureCollection.read_file(path).column
    ['id', 'geometry']
    
  • Write to GeoPackage with a named layer:
    >>> import tempfile
    >>> from pathlib import Path
    >>> import geopandas as gpd
    >>> from shapely.geometry import Point
    >>> from pyramids.feature import FeatureCollection
    >>> d = Path(tempfile.mkdtemp())
    >>> fc = FeatureCollection(
    ...     gpd.GeoDataFrame(
    ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
    ...     )
    ... )
    >>> path = d / "out.gpkg"
    >>> fc.to_file(path, driver="gpkg", layer="rivers")
    >>> FeatureCollection.list_layers(path)
    ['rivers']
    
  • Invalid mode raises ValueError before touching the file:
    >>> import geopandas as gpd
    >>> from shapely.geometry import Point
    >>> from pyramids.feature import FeatureCollection
    >>> fc = FeatureCollection(
    ...     gpd.GeoDataFrame(
    ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
    ...     )
    ... )
    >>> fc.to_file("ignored.geojson", mode="x")
    Traceback (most recent call last):
        ...
    ValueError: mode must be 'w' (write) or 'a' (append); got 'x'.
    
Source code in src/pyramids/feature/collection.py
def to_file(
    self,
    path: str | Path,
    driver: str = "geojson",
    *,
    layer: str | None = None,
    mode: str = "w",
    **creation_options: Any,
) -> None:
    """Write this FeatureCollection to a vector file.

    `layer`, `mode`, and arbitrary driver creation
    options are now first-class kwargs. Previously callers had to
    rely on implicit `**kwargs` forwarding, which hurt
    discoverability.

    Args:
        path (str | Path):
            Destination file path.
        driver (str):
            Driver alias (e.g. `"geojson"`, `"gpkg"`) or
            literal GDAL driver name (`"GeoJSON"`, `"GPKG"`,
            `"ESRI Shapefile"`). Resolved via :class:`Catalog`.
        layer (str | None):
            Layer name for multi-layer drivers (GPKG, GDB, …).
            Writing two layers into the same GPKG is the canonical
            use case. `None` defers to the driver default.
        mode (str):
            `"w"` (default) overwrites; `"a"` appends to an
            existing layer. Append support depends on the driver
            — GPKG and Shapefile accept it, GeoJSON does not.
        **creation_options:
            Driver-specific creation options, forwarded to the
            underlying engine (pyogrio / fiona). Examples:

            * GPKG: `SPATIAL_INDEX="YES"`, `FID="id"`.
            * Shapefile: `ENCODING="UTF-8"`.
            * GeoJSON: `COORDINATE_PRECISION=6`, `RFC7946=YES`.

            Keys are case-preserving and passed verbatim to the
            driver; consult the GDAL driver docs for the full
            list.

            pyogrio (the default geopandas engine on 1.0+)
            raises :class:`ValueError` with the message
            `"unrecognized option '<name>' for driver '<driver>'"`
            when a supplied option is neither in the driver's
            dataset nor its layer creation-option list. This
            surfaces typos (`SPATIAL_INDX` vs `SPATIAL_INDEX`)
            at write-time rather than silently producing a
            different file. Some drivers may still accept options
            that pyogrio does not list — verify against the
            driver's docs when in doubt.

    Raises:
        ValueError: If `mode` isn't `"w"` or `"a"`, or if a
            supplied creation option is not recognised by the
            driver (raised by pyogrio — see the `**creation_options`
            note above).

    Examples:
        - Round-trip a small FC through GeoJSON (the default driver):
            ```python
            >>> import tempfile
            >>> from pathlib import Path
            >>> import geopandas as gpd
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> d = Path(tempfile.mkdtemp())
            >>> fc = FeatureCollection(
            ...     gpd.GeoDataFrame(
            ...         {"id": [1, 2]},
            ...         geometry=[Point(0, 0), Point(1, 1)],
            ...         crs="EPSG:4326",
            ...     )
            ... )
            >>> path = d / "out.geojson"
            >>> fc.to_file(path)
            >>> path.exists()
            True
            >>> FeatureCollection.read_file(path).column
            ['id', 'geometry']

            ```
        - Write to GeoPackage with a named layer:
            ```python
            >>> import tempfile
            >>> from pathlib import Path
            >>> import geopandas as gpd
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> d = Path(tempfile.mkdtemp())
            >>> fc = FeatureCollection(
            ...     gpd.GeoDataFrame(
            ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
            ...     )
            ... )
            >>> path = d / "out.gpkg"
            >>> fc.to_file(path, driver="gpkg", layer="rivers")
            >>> FeatureCollection.list_layers(path)
            ['rivers']

            ```
        - Invalid `mode` raises `ValueError` before touching the file:
            ```python
            >>> import geopandas as gpd
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> fc = FeatureCollection(
            ...     gpd.GeoDataFrame(
            ...         {"id": [1]}, geometry=[Point(0, 0)], crs="EPSG:4326",
            ...     )
            ... )
            >>> fc.to_file("ignored.geojson", mode="x")
            Traceback (most recent call last):
                ...
            ValueError: mode must be 'w' (write) or 'a' (append); got 'x'.

            ```
    """
    if mode not in ("w", "a"):
        raise ValueError(f"mode must be 'w' (write) or 'a' (append); got {mode!r}.")
    try:
        resolved = CATALOG.get_gdal_name(driver) or driver
    except AttributeError:
        resolved = driver

    # pin the engine to pyogrio to match :meth:`read_file` and
    # :meth:`iter_features`. Callers who want fiona for some reason
    # can override via `engine="fiona"` in creation_options, but
    # the default gets the fast path and the pyogrio-specific
    # unknown-option validation.
    passthrough: dict[str, Any] = {
        "driver": resolved,
        "mode": mode,
        "engine": "pyogrio",
    }
    if layer is not None:
        passthrough["layer"] = layer
    passthrough.update(creation_options)
    super().to_file(path, **passthrough)

explode(geometry='multipolygon') #

Explode multi-geometry rows into per-row single geometries.

Returns a new FeatureCollection where every row whose geometry type matches geometry is split so each child geometry becomes its own row. The current frame is not mutated.

Parameters:

Name Type Description Default
geometry str

The geometry type to explode (case-insensitive). Defaults to "multipolygon".

'multipolygon'

Returns:

Name Type Description
FeatureCollection FeatureCollection

A new collection with the same CRS as

FeatureCollection

self and exploded geometries.

Examples:

  • Explode a frame mixing one MultiPolygon with a Polygon:
    >>> import geopandas as gpd
    >>> from shapely.geometry import Polygon, MultiPolygon
    >>> from pyramids.feature import FeatureCollection
    >>> gdf = gpd.GeoDataFrame(
    ...     {
    ...         "name": ["a", "b"],
    ...         "geometry": [
    ...             MultiPolygon([
    ...                 Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]),
    ...                 Polygon([(5, 5), (7, 5), (7, 7), (5, 7)]),
    ...             ]),
    ...             Polygon([(10, 10), (11, 10), (11, 11), (10, 11)]),
    ...         ],
    ...     },
    ...     crs="EPSG:4326",
    ... )
    >>> fc = FeatureCollection(gdf)
    >>> result = fc.explode("multipolygon")
    >>> len(result)
    3
    >>> [g.geom_type for g in result.geometry]
    ['Polygon', 'Polygon', 'Polygon']
    
Source code in src/pyramids/feature/collection.py
def explode(self, geometry: str = "multipolygon") -> FeatureCollection:
    """Explode multi-geometry rows into per-row single geometries.

    Returns a new ``FeatureCollection`` where every row whose geometry
    type matches ``geometry`` is split so each child geometry becomes
    its own row. The current frame is not mutated.

    Args:
        geometry (str): The geometry type to explode (case-insensitive).
            Defaults to ``"multipolygon"``.

    Returns:
        FeatureCollection: A new collection with the same CRS as
        ``self`` and exploded geometries.

    Examples:
        - Explode a frame mixing one MultiPolygon with a Polygon:
            ```python
            >>> import geopandas as gpd
            >>> from shapely.geometry import Polygon, MultiPolygon
            >>> from pyramids.feature import FeatureCollection
            >>> gdf = gpd.GeoDataFrame(
            ...     {
            ...         "name": ["a", "b"],
            ...         "geometry": [
            ...             MultiPolygon([
            ...                 Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]),
            ...                 Polygon([(5, 5), (7, 5), (7, 7), (5, 7)]),
            ...             ]),
            ...             Polygon([(10, 10), (11, 10), (11, 11), (10, 11)]),
            ...         ],
            ...     },
            ...     crs="EPSG:4326",
            ... )
            >>> fc = FeatureCollection(gdf)
            >>> result = fc.explode("multipolygon")
            >>> len(result)
            3
            >>> [g.geom_type for g in result.geometry]
            ['Polygon', 'Polygon', 'Polygon']

            ```
    """
    return FeatureCollection(_geom.explode_gdf(self, geometry=geometry))

with_coordinates() #

Return a new FeatureCollection with per-vertex x and y columns.

non-mutating replacement for the old xy() method (which has been deleted). Matches pandas / geopandas convention — data-transformation methods return a new object. The with_ prefix follows the stdlib/pandas pattern for "return a copy with this change applied" (e.g. :meth:pathlib.Path.with_suffix).

Explodes MultiPolygon and GeometryCollection geometries into their parts first, then attaches x and y columns containing the coordinate sequences of each row.

Returns:

Name Type Description
FeatureCollection FeatureCollection

A new FeatureCollection (self is

FeatureCollection

not modified) with the original columns plus x and

FeatureCollection

y per-vertex coordinate lists.

Examples:

  • A Point FC gets scalar x / y per row:
    >>> import geopandas as gpd
    >>> from shapely.geometry import Point
    >>> from pyramids.feature import FeatureCollection
    >>> fc = FeatureCollection(
    ...     gpd.GeoDataFrame(
    ...         {"id": [1, 2]},
    ...         geometry=[Point(1.0, 2.0), Point(3.0, 4.0)],
    ...         crs="EPSG:4326",
    ...     )
    ... )
    >>> out = fc.with_coordinates()
    >>> list(out["x"])
    [1.0, 3.0]
    >>> list(out["y"])
    [2.0, 4.0]
    
  • The input FC is not mutated:
    >>> import geopandas as gpd
    >>> from shapely.geometry import Point
    >>> from pyramids.feature import FeatureCollection
    >>> fc = FeatureCollection(
    ...     gpd.GeoDataFrame(
    ...         {"id": [1]}, geometry=[Point(0.0, 0.0)],
    ...         crs="EPSG:4326",
    ...     )
    ... )
    >>> _ = fc.with_coordinates()
    >>> "x" in fc.columns
    False
    
Source code in src/pyramids/feature/collection.py
def with_coordinates(self) -> FeatureCollection:
    """Return a new FeatureCollection with per-vertex `x` and `y` columns.

    non-mutating replacement for the old `xy()` method
    (which has been deleted). Matches pandas / geopandas
    convention — data-transformation methods return a new object.
    The `with_` prefix follows the stdlib/pandas pattern for
    "return a copy with this change applied" (e.g.
    :meth:`pathlib.Path.with_suffix`).

    Explodes MultiPolygon and GeometryCollection geometries into
    their parts first, then attaches `x` and `y` columns
    containing the coordinate sequences of each row.

    Returns:
        FeatureCollection: A new FeatureCollection (`self` is
        not modified) with the original columns plus `x` and
        `y` per-vertex coordinate lists.

    Examples:
        - A Point FC gets scalar `x` / `y` per row:
            ```python
            >>> import geopandas as gpd
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> fc = FeatureCollection(
            ...     gpd.GeoDataFrame(
            ...         {"id": [1, 2]},
            ...         geometry=[Point(1.0, 2.0), Point(3.0, 4.0)],
            ...         crs="EPSG:4326",
            ...     )
            ... )
            >>> out = fc.with_coordinates()
            >>> list(out["x"])
            [1.0, 3.0]
            >>> list(out["y"])
            [2.0, 4.0]

            ```
        - The input FC is not mutated:
            ```python
            >>> import geopandas as gpd
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> fc = FeatureCollection(
            ...     gpd.GeoDataFrame(
            ...         {"id": [1]}, geometry=[Point(0.0, 0.0)],
            ...         crs="EPSG:4326",
            ...     )
            ... )
            >>> _ = fc.with_coordinates()
            >>> "x" in fc.columns
            False

            ```
    """
    gdf = _geom.explode_gdf(
        gpd.GeoDataFrame(self, copy=True), geometry="multipolygon"
    )
    gdf = _geom.explode_gdf(gdf, geometry="geometrycollection")

    fc = FeatureCollection(gdf)
    fc["x"] = fc.apply(
        _geom.get_coords, geom_col="geometry", coord_type="x", axis=1
    )
    fc["y"] = fc.apply(
        _geom.get_coords, geom_col="geometry", coord_type="y", axis=1
    )
    fc.reset_index(drop=True, inplace=True)
    return fc

plot(column=None, basemap=None, **kwargs) #

Plot features, optionally on a web-tile basemap.

Delegates to :meth:geopandas.GeoDataFrame.plot and, when basemap is truthy, adds an OSM (or named provider) tile layer underneath.

Raises:

Type Description
ValueError

If basemap is requested but the FC has no CRS.

Source code in src/pyramids/feature/collection.py
def plot(
    self,
    column: str | None = None,
    basemap: bool | str | None = None,
    **kwargs: Any,
) -> Any:
    """Plot features, optionally on a web-tile basemap.

    Delegates to :meth:`geopandas.GeoDataFrame.plot` and, when
    `basemap` is truthy, adds an OSM (or named provider) tile
    layer underneath.

    Raises:
        ValueError: If `basemap` is requested but the FC has no CRS.
    """
    ax = super().plot(column=column, **kwargs)

    if basemap:
        if self.epsg is None:
            raise CRSError(
                "FeatureCollection must have a CRS (epsg) to use basemap."
            )
        source = basemap if isinstance(basemap, str) else None
        add_basemap(ax, crs=self.epsg, source=source)

    return ax

concat(other) #

Concatenate another GeoDataFrame onto this FeatureCollection.

mirrors :func:pandas.concat — returns a new FeatureCollection and never mutates self. No inplace kwarg (pandas' pd.concat has never had one; follow the convention).

Equivalent to pd.concat([fc, other]) which also works directly and returns a FeatureCollection via the _constructor hook.

a CRS mismatch between self and other raises :class:pyramids.base._errors.CRSError. The old behaviour silently adopted self's CRS — which corrupted the other rows' coordinates if the two frames were in different CRSes. Callers that want to force-concat across CRSes must other.to_crs(self.crs) first. An unset-on-one-side case (one CRS is None) is permitted so you can seed a CRS by concatenating a CRS-carrying frame onto a freshly-constructed empty FC.

Parameters:

Name Type Description Default
other GeoDataFrame

The rows to append.

required

Returns:

Name Type Description
FeatureCollection FeatureCollection

A new FC containing self's rows

FeatureCollection

followed by other's rows, with self's CRS and a

FeatureCollection

freshly-reset index.

Raises:

Type Description
CRSError

If both frames carry a CRS and the two CRSes do not match.

Examples:

  • Concatenate two single-row FCs on matching CRS:
    >>> import geopandas as gpd
    >>> from shapely.geometry import Point
    >>> from pyramids.feature import FeatureCollection
    >>> a = FeatureCollection(
    ...     gpd.GeoDataFrame(
    ...         {"id": [1]}, geometry=[Point(0, 0)],
    ...         crs="EPSG:4326",
    ...     )
    ... )
    >>> b = FeatureCollection(
    ...     gpd.GeoDataFrame(
    ...         {"id": [2]}, geometry=[Point(1, 1)],
    ...         crs="EPSG:4326",
    ...     )
    ... )
    >>> out = a.concat(b)
    >>> len(out)
    2
    >>> list(out["id"])
    [1, 2]
    >>> out.crs.to_epsg()
    4326
    
  • CRS mismatch raises CRSError:
    >>> import geopandas as gpd
    >>> from shapely.geometry import Point
    >>> from pyramids.feature import FeatureCollection
    >>> a = FeatureCollection(
    ...     gpd.GeoDataFrame(
    ...         {"id": [1]}, geometry=[Point(0, 0)],
    ...         crs="EPSG:4326",
    ...     )
    ... )
    >>> b = FeatureCollection(
    ...     gpd.GeoDataFrame(
    ...         {"id": [2]}, geometry=[Point(1, 1)],
    ...         crs="EPSG:3857",
    ...     )
    ... )
    >>> a.concat(b)
    Traceback (most recent call last):
        ...
    pyramids.base._errors.CRSError: concat: CRS mismatch...
    
Source code in src/pyramids/feature/collection.py
def concat(self, other: GeoDataFrame) -> FeatureCollection:
    """Concatenate another GeoDataFrame onto this FeatureCollection.

    mirrors :func:`pandas.concat` — returns a new
    `FeatureCollection` and never mutates `self`. No
    `inplace` kwarg (pandas' `pd.concat` has never had one;
    follow the convention).

    Equivalent to `pd.concat([fc, other])` which also works
    directly and returns a `FeatureCollection` via the
    `_constructor` hook.

    a CRS mismatch between `self` and `other` raises
    :class:`pyramids.base._errors.CRSError`. The old behaviour
    silently adopted `self`'s CRS — which corrupted the
    `other` rows' coordinates if the two frames were in
    different CRSes. Callers that want to force-concat across
    CRSes must `other.to_crs(self.crs)` first. An
    unset-on-one-side case (one CRS is `None`) is permitted so
    you can seed a CRS by concatenating a CRS-carrying frame
    onto a freshly-constructed empty FC.

    Args:
        other (GeoDataFrame): The rows to append.

    Returns:
        FeatureCollection: A new FC containing `self`'s rows
        followed by `other`'s rows, with `self`'s CRS and a
        freshly-reset index.

    Raises:
        CRSError: If both frames carry a CRS and the two CRSes
            do not match.

    Examples:
        - Concatenate two single-row FCs on matching CRS:
            ```python
            >>> import geopandas as gpd
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> a = FeatureCollection(
            ...     gpd.GeoDataFrame(
            ...         {"id": [1]}, geometry=[Point(0, 0)],
            ...         crs="EPSG:4326",
            ...     )
            ... )
            >>> b = FeatureCollection(
            ...     gpd.GeoDataFrame(
            ...         {"id": [2]}, geometry=[Point(1, 1)],
            ...         crs="EPSG:4326",
            ...     )
            ... )
            >>> out = a.concat(b)
            >>> len(out)
            2
            >>> list(out["id"])
            [1, 2]
            >>> out.crs.to_epsg()
            4326

            ```
        - CRS mismatch raises `CRSError`:
            ```python
            >>> import geopandas as gpd
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> a = FeatureCollection(
            ...     gpd.GeoDataFrame(
            ...         {"id": [1]}, geometry=[Point(0, 0)],
            ...         crs="EPSG:4326",
            ...     )
            ... )
            >>> b = FeatureCollection(
            ...     gpd.GeoDataFrame(
            ...         {"id": [2]}, geometry=[Point(1, 1)],
            ...         crs="EPSG:3857",
            ...     )
            ... )
            >>> a.concat(b)
            Traceback (most recent call last):
                ...
            pyramids.base._errors.CRSError: concat: CRS mismatch...

            ```
    """
    # validate CRS agreement up front.
    if self.crs is not None and other.crs is not None:
        if self.crs != other.crs:
            raise CRSError(
                f"concat: CRS mismatch — self.crs = {self.crs!r}, "
                f"other.crs = {other.crs!r}. Reproject one side "
                f"— `other.to_crs(self.crs)` OR "
                f"`self.to_crs(other.crs)` — before "
                f"concatenating, or strip one CRS with "
                f".set_crs(None, allow_override=True)."
            )
    combined = gpd.GeoDataFrame(pd.concat([self, other]))
    combined.index = list(range(len(combined)))
    combined.crs = self.crs if self.crs is not None else other.crs
    return FeatureCollection(combined)

with_centroid() #

Return a new FC with per-feature center-point columns attached.

non-mutating replacement for the old center_point() method (which has been deleted). The with_ prefix mirrors stdlib / pandas conventions for "return a copy with this change applied".

Computes average x/y per feature (after :meth:with_coordinates) and attaches three columns: avg_x, avg_y and center_point (shapely Point).

feeding a degenerate or empty geometry (for example an empty Point, or a Polygon whose ring has zero area) produces (NaN, NaN) averages. The method emits a single UserWarning listing the row indices whose avg_x / avg_y could not be computed so downstream code can guard against the NaN centroids instead of silently consuming them. The center_point value at those rows is an empty shapely.Point (Point.is_empty is True) rather than a (NaN, NaN) point.

Returns:

Name Type Description
FeatureCollection FeatureCollection

A new FeatureCollection (self is

FeatureCollection

not modified) with x, y, avg_x, avg_y,

FeatureCollection

center_point columns added.

Examples:

  • Compute centroids for a 2-polygon FC:
    >>> import geopandas as gpd
    >>> from shapely.geometry import Polygon
    >>> from pyramids.feature import FeatureCollection
    >>> fc = FeatureCollection(
    ...     gpd.GeoDataFrame(
    ...         {"id": [1, 2]},
    ...         geometry=[
    ...             Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]),
    ...             Polygon([(4, 4), (6, 4), (6, 6), (4, 6)]),
    ...         ],
    ...         crs="EPSG:4326",
    ...     )
    ... )
    >>> out = fc.with_centroid()
    >>> [(p.x, p.y) for p in out["center_point"]]
    [(0.8, 0.8), (4.8, 4.8)]
    
  • A Point FC is a no-op for the coordinate lists (each row is already a single vertex); the centroid equals the point:
    >>> import geopandas as gpd
    >>> from shapely.geometry import Point
    >>> from pyramids.feature import FeatureCollection
    >>> fc = FeatureCollection(
    ...     gpd.GeoDataFrame(
    ...         {"id": [1, 2]},
    ...         geometry=[Point(3.0, 4.0), Point(7.0, 8.0)],
    ...         crs="EPSG:4326",
    ...     )
    ... )
    >>> out = fc.with_centroid()
    >>> [(p.x, p.y) for p in out["center_point"]]
    [(3.0, 4.0), (7.0, 8.0)]
    
Source code in src/pyramids/feature/collection.py
def with_centroid(self) -> FeatureCollection:
    """Return a new FC with per-feature center-point columns attached.

    non-mutating replacement for the old `center_point()`
    method (which has been deleted). The `with_` prefix mirrors
    stdlib / pandas conventions for "return a copy with this
    change applied".

    Computes average x/y per feature (after
    :meth:`with_coordinates`) and attaches three columns:
    `avg_x`, `avg_y` and `center_point` (shapely `Point`).

    feeding a degenerate or empty geometry (for example an
    empty `Point`, or a `Polygon` whose ring has zero area)
    produces `(NaN, NaN)` averages. The method emits a single
    `UserWarning` listing the row indices whose `avg_x` /
    `avg_y` could not be computed so downstream code can guard
    against the NaN centroids instead of silently consuming them.
    The `center_point` value at those rows is an empty
    `shapely.Point` (`Point.is_empty is True`) rather than a
    `(NaN, NaN)` point.

    Returns:
        FeatureCollection: A new FeatureCollection (`self` is
        not modified) with `x`, `y`, `avg_x`, `avg_y`,
        `center_point` columns added.

    Examples:
        - Compute centroids for a 2-polygon FC:
            ```python
            >>> import geopandas as gpd
            >>> from shapely.geometry import Polygon
            >>> from pyramids.feature import FeatureCollection
            >>> fc = FeatureCollection(
            ...     gpd.GeoDataFrame(
            ...         {"id": [1, 2]},
            ...         geometry=[
            ...             Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]),
            ...             Polygon([(4, 4), (6, 4), (6, 6), (4, 6)]),
            ...         ],
            ...         crs="EPSG:4326",
            ...     )
            ... )
            >>> out = fc.with_centroid()
            >>> [(p.x, p.y) for p in out["center_point"]]
            [(0.8, 0.8), (4.8, 4.8)]

            ```
        - A Point FC is a no-op for the coordinate lists (each row
          is already a single vertex); the centroid equals the point:
            ```python
            >>> import geopandas as gpd
            >>> from shapely.geometry import Point
            >>> from pyramids.feature import FeatureCollection
            >>> fc = FeatureCollection(
            ...     gpd.GeoDataFrame(
            ...         {"id": [1, 2]},
            ...         geometry=[Point(3.0, 4.0), Point(7.0, 8.0)],
            ...         crs="EPSG:4326",
            ...     )
            ... )
            >>> out = fc.with_centroid()
            >>> [(p.x, p.y) for p in out["center_point"]]
            [(3.0, 4.0), (7.0, 8.0)]

            ```
    """
    fc = self.with_coordinates()
    for i, row_i in fc.iterrows():
        fc.loc[i, "avg_x"] = np.mean(row_i["x"])
        fc.loc[i, "avg_y"] = np.mean(row_i["y"])

    # detect rows whose averaged coordinate could not be
    # computed (empty geometry, all-NaN rings, etc.). Emit a single
    # summary warning and substitute an empty Point so the column
    # does not expose a `(NaN, NaN)` Point that would then crash
    # downstream reprojections.
    avg_x = fc["avg_x"].to_numpy()
    avg_y = fc["avg_y"].to_numpy()
    bad_mask = np.isnan(avg_x) | np.isnan(avg_y)
    if bad_mask.any():
        bad_idx = [int(i) for i, is_bad in enumerate(bad_mask) if is_bad]
        warnings.warn(
            f"with_centroid: {len(bad_idx)} row(s) yielded NaN centroids "
            f"(rows {bad_idx}). Their `center_point` is an empty "
            f"shapely.Point. Drop or repair those rows before running "
            f"a method that requires a valid centroid (e.g. reproject, "
            f"distance).",
            GeometryWarning,
            stacklevel=2,
        )

    # single-pass build. The previous implementation built a
    # throwaway `coords_list` (with NaN placeholders for the bad
    # rows), called `create_points` on it, then iterated the
    # result a second time to substitute empty Points for the bad
    # rows. Skip both intermediates — write the final column value
    # directly.
    cleaned: list[Any] = [
        Point() if bad else Point(ax, ay)
        for ax, ay, bad in zip(avg_x.tolist(), avg_y.tolist(), bad_mask.tolist())
    ]
    fc["center_point"] = cleaned
    return fc