Store a GeoJSON file in a parquet format

fastparquetgeojsongeoparquetjupyter notebookpython

I have a large GeoJSON file which I store and read as follows:

geomap.to_file(path_to_output + 'geomap_cleaned.geojson', driver="GeoJSON")

geomap = import_geojson(path_to_data, path_to_cadgis, 'geomap_cleaned.geojson')

My issue is that geomap is a large file (1G) and the kernel in my Jupyter Notebook crashes most of time when I try to read it.

I made an attempt to save the GeoJSON file in a fast parquet format (my aim is to make things faster and more optimised)

geomap.to_parquet(path_to_data + 'geomap_cleaned.gzip', compression='GZIP', engine='pyarrow')

but i get an error

ArrowInvalid: Cannot parse URI: './Source data/geomap.gzip'

How can I solve this problem? And how can I make sure the geometries, when stored in parquet format, are not corrupted?

I also installed successfully geoparquet by using pip install geoparquet but when i save the file as :

geomap.to_geoparquet('geomap_cleaned.geoparquet')

I get an error

TypeError: Object of type CRS is not JSON serializable

What is wrong here?

Best Answer

You can use Geopandas to covert the geojson to geoparquet. Sometimes you need to explicitly set the CRS, for this example I am assuming its 4326.

import geopandas as gpd
geojson_fn='test'
geoparquet_fn='output_geoparquet'
gdf=gpd.read_file(f'{geojson_fn}.geojson')
gdf.set_crs(epsg=4326)
gdf_projected.to_parquet(f'{geoparquet_fn}.parquet')
Related Question