GeoPandas – Why GeoPandas Behaves Differently on Windows and Linux and Possible Workarounds

geopandaspyprojpython

I am using Python Geopandas 0.12.2 to read and reproject a shapefile. My code works on Windows 10 but does not produce the correct results on Linux. Specifically, the Linux implementation produces a geometry that consists entirely of 'inf' values (see output below). I have used the same environment.yml file to build the same conda/mamba environment on both systems. The source shapefile includes a single polygon with crs=EPSG 4269.

To get the correct behavior on Linux I've tried (without success) a handful of interventions aimed at lower-level control using Shapely and PyProj rather than Geopandas; these interventions were informed by the discussions involving PyProj axis order and the 'always_xy' parameter (e.g., https://stackoverflow.com/questions/60480010/python-pyproj-transform-yielding-different-results-for-the-same-input-parameters).
But my prior question is: Should this reprojection command require different implementation or arguments on Linux?

Here is the code:

import os
import geopandas as gpd
import pyproj

PROJ4 = "+datum=WGS84 +lat_0=23 +lat_1=29.5 +lat_2=45.5 +lon_0=-96 +no_defs +proj=aea +units=m +x_0=0 +y_0=0"
crs_to = pyproj.CRS.from_proj4(PROJ4)

# For reproducibility: Generate a sample GeoDataframe from a subset of the 
# coordinates of my actual polygon shapefile
coords = [[-74.051, 42.818], [-74.0496, 42.819], [-74.0495, 42.817], [-74.0495, 42.817], [-74.051, 42.818]]
geojson={"type":"FeatureCollection", "features":[{"type":"Feature", "properties":{"id":1},
                                                  "geometry":{"type":"MultiPolygon", "coordinates":[[coords]]}}]}

shp_from = gpd.GeoDataFrame.from_features(geojson, crs='EPSG:4269')


shp_to = shp_from.to_crs(crs_to)

print('\nos name: ', os.name)
print('geopandas version: ', gpd.__version__)
print('pyproj version: ', pyproj.__version__)

print('crs_from: ', shp_from.crs)
print('crs_to: ', crs_to)

print('\n\nSource geometry')
print(shp_from.geometry)

print('\n\nTarget geometry')
print(shp_to.geometry)

[UPDATE]: On Linux, os.environ['PROJ_DATA'] = /path/to/env/share/proj. On Windows, there is no PROJ_DATA in os.environ.


Here is the output on Windows:


os name:  nt  
geopandas version:  0.12.2  
pyproj version:  3.4.1  
crs_from:  epsg:4269  
crs_to:  +proj=aea +datum=WGS84 +lat_0=23 +lat_1=29.5 +lat_2=45.5 +lon_0=-96 +no_defs +units=m +x_0=0 +y_0=0 +type=crs  

Environment variables include: ['PROJ_CURL_CA_BUNDLE']  
No PROJ_DATA in environment


Source geometry
0    MULTIPOLYGON (((-74.05100 42.81800, -74.04960 ...

Name: geometry, dtype: geometry


Target geometry
0    MULTIPOLYGON (((1768715.829 2407533.606, 17688...

Name: geometry, dtype: geometry

Here is the output on Linux:


os name:  posix  
geopandas version:  0.12.2  
pyproj version:  3.4.1  
crs_from:  epsg:4269  
crs_to:  +proj=aea +datum=WGS84 +lat_0=23 +lat_1=29.5 +lat_2=45.5 +lon_0=-96 +no_defs +units=m +x_0=0 +y_0=0 +type=crs  

Environment variables include: ['PROJ_DATA', 'PROJ_NETWORK', 'PROJ_CURL_CA_BUNDLE']  
PROJ_DATA = /home/wzell/mambaforge/envs/hyriver/share/proj


Source geometry
0    MULTIPOLYGON (((-74.05100 42.81800, -74.04960 ...
Name: geometry, dtype: geometry


Target geometry
0    MULTIPOLYGON ((inf inf, inf inf, inf inf, inf inf, ...
Name: geometry, dtype: geometry

Best Answer

It's not Windows and Linux behaving differently, it's something amiss in your Linux environment. I'm running your script on Linux with no issues and get the correct output. It may be that proj can't reach the internet to access the transformation grids from your Linux device.

Try os.environ['PROJ_NETWORK'] = 'OFF' ala First call to transform() fails with inf, all subsequent calls are OK - what could be the reason?

E.g.

import os
os.environ['PROJ_NETWORK'] = 'OFF'

import geopandas as gpd
import pyproj

# rest of script...

Some other refs:

Related Question