[GIS] Problem getting correct area for polygon and choosing a CRS

areacoordinate systemgeopandaspython

Using GeoPandas and I'd like to calculate the areas for the geometries in my dataframe. I'm also doing other things like finding the area of overlapping regions of geometries from different dataframes, but I think an answer to the basic question will solve all my problems in area calculations.

The polygon coordinates are all saved as "naive geometries" and I've set the CRS to 4326.
To calculate areas, I only need a CRS that works reasonably well across the Tokyo area.
I have data from the government that includes the land area of administrative units, and using CRS=3095 gives very close results (so does {'proj':'cea'}).

One of my datasets is a hexagonal grid, where the side radius (apothem) is 125 meters, so the area of each hexagon is 54,127 m2. However, when I calculate the area of the hexagon geometries using CRS=3095 I get 3,764,267,156 (which should be meters sq because the CRS documentation says it is in meters).

Using CRS=3857 I get 5,764,828,070. That's very different, but still in the same ballpark as the other CRS compared to the known area of 54,127 m2. Those are insanely large (and wrong) numbers for my hexes.

So my question is not so much choosing a CRS for my region, but rather how do I interpret the values coming from these calculations? Knowing the area is 54,127 m2, how can set a CRS and perform the calculation of area on my geometries such that it will produce that value (to a reasonable approximation)?

Best Answer

The problem is caused by setting to_crs() based on another non-lat/lon CRS instead of based on the naive geometries.

Here's something that works. First import the packages and your geopandas dataframe. In my case, I'm loading just one hexagon and making a dataframe out of it to test the code.

import geopandas as gp
from shapely import wkt
data = {'geometry':'POLYGON ((139.7671 35.68250088833567, 139.7684808422803 35.68185043597622, 139.7684808198672 35.68054954811675, 139.7671 35.67989911138261, 139.7657191801327 35.68054954811675, 139.7657191577197 35.68185043597622, 139.7671 35.68250088833567))'}    
df = gp.GeoDataFrame([data], columns=data.keys())
df['geometry'] = df['geometry'].apply(wkt.loads)

In your case, you can just read in the geoPandasDataframe in whatever way is appropriate. Now set the initial CRS to naive geometries; i.e., the standard projection in decimal degrees of longitude and latitude.

df.crs = 'epsg:4326'

Now, since we're trying to calculate surface areas, it's a good idea to use cea (cylindrical equal area) projections. And to minimize distortion you can specify a lat/lon to use as a reference point (I used the first point of the polygon above, which is in the center of my region of interest).

df = df.to_crs("+proj=cea +lat_0=35.68250088833567 +lon_0=139.7671 +units=m")

Now you can confirm the transformed geometry in meters, and the area in square meters of the hexagon in this projection:

print(df.at[0,'geometry'])
print("Area of 54,127m2 hexagon is:",df.at[0,'geometry'].area)

It yields 54,126.5868 m2, so that's extremely accurate to the known value of 54,127.

The question about choosing a CRS for calculating areas and getting it to work with GeoPandas came up in a lot on forums and SE, but none of the answers were clear or straightforward or actually working. I hope my struggle with this, and my simple example here, can accelerate others' complicated and torturous journey through GIS computing to just get the area of their polygons.