[GIS] Only read specific attribute columns of a shapefile with Geopandas / Fiona

fionageopandaspython

Having geopandas installed in my Python environment, I can read a shapefile into a geodataframe with

In:
import geopandas as gpd
myShapefile = gpd.read_file(path_to_my_shapefile)
print(myShapefile)

Out:
myShapefile as a geodataframe

Unfortunately, I have some shapefiles which contain lots of attribute columns which I don't need in the end, slowing down the reading process a lot. Is there any possibility to limit the reading of the shapefile to specific attribute columns?

In regular pandas, I could use the usecols argument to the read_csv and read_table functions to limit the reading to the specified columns, e.g.

import pandas as pd
pd.read_csv(path_to_my_csv_file, usecols=['onlyThisColumn', 'andThatColumnAsWell', 'butNoOther'])

However, using usecols in geopandas' read_file method gives an error, probably because geopandas uses Fiona to read shapefiles which does not accept the argument.

  File "C:\Python34-64bit\lib\site-packages\geopandas\io\file.py", line 13, in read_file
    with fiona.open(filename, **kwargs) as f:
TypeError: open() got an unexpected keyword argument 'usecols'

Is there any other argument or way to achieve this with geopandas/Fiona?

Best Answer

Building on gene's answer, you can use GeoDataFrame.from_features

The following should do the trick:

import fiona

def records(filename, usecols, **kwargs):
    with fiona.open(filename, **kwargs) as source:
        for feature in source:
            f = {k: feature[k] for k in ['id', 'geometry']}
            f['properties'] = {k: feature['properties'][k] for k in usecols}
            yield f

And then

gpd.GeoDataFrame.from_features(records(filename), ['prop1', 'prop2'])

I'd be curious to know if this speeds up your code, or if the properties need to be ignored in Fiona rather than the GeoDataFrame construction.

Related Question