Python – Speeding Up Reading GPKG as GeoPandas DataFrame

geopackagegeopandaspython

I have multiple point gpkg files which I would like to use with GeoPandas, however, reading files into the script takes always very long when files are bigger > 150MB.

Generally I just read the data with:

import geopandas as gpd

gdf = gpd.read_file(r'path/to/file.gpkg')

However, this is always pretty slow. Is there some other way to read the data, but still be able to work with dataframes?

Best Answer

Install and use the optional pyogrio I/O engine to read the data... that will be a lot faster. Adding the use_arrow=True parameter as well will give another big performance improvement, but then you'll also have to install the pyarrow library:

import geopandas as gpd

gdf = gpd.read_file(r'path/to/file.gpkg', engine='pyogrio', use_arrow=True)

You can also change the defaults globally

  • geopandas.options.io_engine = "pyogrio"
  • os.environ["PYOGRIO_USE_ARROW"] = 1

Some timings using a 360 MB .gpkg with polygon data (on windows):

read_file took 0:07:15.306107 with fiona engine (= the current default)
read_file took 0:00:24.831681 with pyogrio engine
read_file took 0:00:02.830280 with pyogrio engine and use_arrow=True