-
Notifications
You must be signed in to change notification settings - Fork 471
Replace Fiona with Geopandas Everywhere #2962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR replaces the Fiona library with Geopandas everywhere in the codebase to consolidate geospatial file I/O operations. The changes maintain the same functionality while switching to a more modern and unified geospatial data handling approach.
- Removes Fiona dependency from the project and updates all imports to use Geopandas
- Refactors file reading operations to use
gpd.read_file()
instead offiona.open()
- Updates coordinate transformation logic to use Geopandas and PyProj instead of Fiona's transform functions
Reviewed Changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 7 comments.
Show a summary per file
File | Description |
---|---|
torchgeo/datasets/spacenet/base.py | Replaces Fiona file reading with Geopandas for mask loading |
torchgeo/datasets/pastis.py | Updates metadata loading to use Geopandas instead of Fiona |
torchgeo/datasets/openbuildings.py | Converts building polygon processing to use Geopandas and PyProj |
torchgeo/datasets/idtrees.py | Replaces Fiona geometry loading with Geopandas iterrows |
torchgeo/datasets/geo.py | Updates vector dataset handling to use Geopandas throughout |
torchgeo/datasets/enviroatlas.py | Converts spatial index reading to use Geopandas |
torchgeo/datasets/chesapeake.py | Updates spatial index processing with Geopandas |
tests/data/pastis/data.py | Updates test data generation to use Geopandas |
tests/data/eurocrops/data.py | Converts test data creation to use Geopandas |
tests/data/enviroatlas/data.py | Updates spatial index generation with Geopandas |
tests/data/README.md | Updates documentation examples to use Geopandas |
requirements/required.txt | Removes Fiona dependency |
requirements/min-reqs.old | Removes Fiona from minimum requirements |
pyproject.toml | Removes Fiona from project dependencies |
docs/conf.py | Removes Fiona class from documentation ignore list |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
torchgeo/datasets/enviroatlas.py
Outdated
) | ||
for _, row in gdf.iterrows(): | ||
if row['split'] in splits: | ||
geometries.append(shapely.geometry.shape(row['geometry'])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When iterating over a GeoDataFrame with iterrows()
, the geometry column is already a Shapely object, so shapely.geometry.shape()
is not needed and may cause errors. Should be geometries.append(row.geometry)
.
geometries.append(shapely.geometry.shape(row['geometry'])) | |
geometries.append(row['geometry']) |
Copilot uses AI. Check for mistakes.
geometries.append(geometry) | ||
except fiona.errors.FionaValueError: | ||
# Skip files that fiona is unable to read | ||
except pyogrio.errors.DataSourceError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adamjstewart unfortunately geopandas doesn't have it's own error that it throws when it can't read a file, it defaults to letting pyogrio
throw an error. Basically, would have to add pyogrio
as a dependency just for this error...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataSourceError
is a subclass of RuntimeError
, just catch that instead.
@@ -896,21 +903,6 @@ def __getitem__(self, query: GeoSlice) -> dict[str, Any]: | |||
|
|||
return sample | |||
|
|||
def get_label(self, feature: 'fiona.model.Feature') -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adamjstewart Not sure if we need to deprecate this or not. This is not as performant as using geopandas directly as this function requires a for loop and my change using geopandas doesn't. The only gotcha here is that the Eurocrops dataset overrides this method with some custom logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is currently documented, so it's technically a public API function. If we can find a way to deprecate it, that would be ideal. If we can't, well that's what the major version bump is for. I'll mark this as currently backwards-incompatible for now.
|
||
ROOT = "data/cbf" | ||
FILENAME = "Ontario.geojson" | ||
|
||
rec = {"type": "Feature", "id": "0", "properties": OrderedDict(), "geometry": {"type": "Polygon", "coordinates": [[(0, 0), (0, 1), (1, 1), (1, 0), (0, 0)]]}} | ||
with fiona.open(os.path.join(ROOT, FILENAME), "r") as src: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason we're using fiona in these docs is so that we can copy the src.meta
profile from the real data and use it when generating our fake data. If we can ensure the same metadata is transferred using geopandas, then that's fine. If not, let's keep using fiona in the docs.
@@ -7,8 +7,7 @@ | |||
import shutil | |||
from typing import Any | |||
|
|||
import fiona | |||
import fiona.transform | |||
import geopandas as gpd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here. We technically don't have to replace fiona in our data.py
files if we don't want to, as this code isn't used at run-time or test-time. The only incentive to replace it with geopandas is that we may someday generate test data on-the-fly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I might leave the data.py files untouched for now since they are already working.
No description provided.