Specifying dtype of columns when reading in data with GeoPandas

geopandaspython

I have an ESRI geodatabase with the following attribute table structure (just a small toy example, my geodatabase consists of several milllion features):

esri_gdb
       UID  Value                 geometry
0  P1_2021   1.01  POINT (1.00000 2.00000)
1  P2_2024   2.52  POINT (2.00000 1.00000)
2  P3_2035   3.24  POINT (3.00000 5.00000)

The first column of the attribute table (UID) contains strings (dtype object) and the second column (Value) is of dtype float64.

esri_gdb.info(verbose=True, memory_usage='deep')

<class 'geopandas.geodataframe.GeoDataFrame'>
    RangeIndex: 3 entries, 0 to 2
    Data columns (total 3 columns):
     #   Column    Non-Null Count  Dtype   
    ---  ------    --------------  -----   
     0   UID       3 non-null      object  
     1   Value     3 non-null      float64 
     2   geometry  3 non-null      geometry
    dtypes: float64(1), geometry(1), object(1)
    memory usage: 368.0 bytes

I would like to convert the columns UID to categorical and the column Value to float32 in order to use less memory (please remember: I have several million features in my data!). I can convert the dtype of a column like this:

# read in file
esri_gdb_path = r'/MyProject/Data.gdb/somedata'
esri_gdf = read.file(esri_gdb_path)

# change dtype of column UID from string (object) to category
esri_gdf.UID = esri_gdf.UID.astype('category')

# change dtype of column Value from float64 to float32
esri_gdf.Value = esri_gdf.Value.astype('float32')

Is there a way to directly change the dtype of the columns when reading in the data in GeoPandas?

Specifying the dtype option as in pandas (example see here) and passing a dictionary with the dtypes seems to have no effect, the dtypes stay the same.

# read in file
esri_gdb_path = r'/MyProject/Data.gdb/somedata'
dtype_dict = {'UID':'category', 'Value':'float32', 'geometry':'geometry'}
esri_gdf = gpd.read_file(esri_gdb_path, dtype=dtype_dict)

Best Answer

I believe the answer to this is "No". There is currently no way to specify the dtype of the columns when reading in the data in GeoPandas. From looking at the source code it seems clear there is no hook for it.

It's worth understanding why this is not possible. GeoPandas tries to align with Pandas behaviours in many ways. So, why not this? It's because the file-reading operations are not implemented in GeoPandas but in underlying Python libraries such as Fiona. Those underlying libraries are responsible for creating/iterating the data structures, but they don't provide dtype operations because they're not specialised for Pandas or even numpy.

If you have difficulties loading the full data file in GeoPandas, one work-around, if your data are in tabular format, is: (a) load the data using plain Pandas and then (b) join the geometries on (using GeoPandas) after you've done initial preprocessing.