[GIS] Converting large data with lat and long into X and Y

convertlatitude longitudepyprojpythonutm

I have 9888562 records in dataframe and I would like to convert my lat, long to UTM x,y. according to my code, I have used pyproj package but because my data are too much it takes a long time and finally, it doesn't work.
I wonder whether you know another way or package that I can use for my data?

def rule(row):
    p = Proj(proj='utm',zone=10,ellps='WGS84', preserve_units=False)
    x,y = p(row["LON"], row["LAT"])
    return pd.Series({"X": x , "Y": y})
My_data = My_data.merge(My_data.apply(rule, axis=1), left_index= True, right_index= True)

My data is like this

Best Answer

UPDATE:

After thinking about it, the most efficient method for you to transform the coordinates is probably to not use apply but to use the column array.

from pyproj import Proj
pp = Proj(proj='utm',zone=10,ellps='WGS84', preserve_units=False)

xx, yy = pp(My_data["LON"].values, My_data["LAT"].values)
My_data["X"] = xx
My_data["Y"] = yy 

Using Transformer

from pyproj import Transformer

trans = Transformer.from_crs(
    "epsg:4326",
    "+proj=utm +zone=10 +ellps=WGS84",
    always_xy=True,
)
xx, yy = trans.transform(My_data["LON"].values, My_data["LAT"].values)
My_data["X"] = xx
My_data["Y"] = yy

ORIGINAL ANSWER:

This answer here is great: https://gis.stackexchange.com/a/334276/144357

The solution below is for the purposes of understanding the root of the problem a bit better.

Your code in its current form re-constructs the Proj object with each iteration. This is a costly operation and is why the pyproj.Transformer object was created. It assists with repeated transformations because you don't have to re-create it each time (see: https://pyproj4.github.io/pyproj/stable/advanced_examples.html#repeated-transformations).

So, to avoid re-creating the Proj object, you can modify your code like so:

from pyproj import Proj
from functools import partial

p = Proj(proj='utm',zone=10,ellps='WGS84', preserve_units=False)

def impartial_rule(row, proj):
    x,y = proj(row["LON"], row["LAT"])
    return pd.Series({"X": x , "Y": y})

rule = partial(impartial_rule, proj=p)
My_data = My_data.merge(My_data.apply(rule, axis=1), left_index= True, right_index= True)

This should improve your performance.

Here is the equivalent using the pyproj.Transformer:

from pyproj import Transformer
from functools import partial

trans = Transformer.from_crs(
    "epsg:4326",
    "+proj=utm +zone=10 +ellps=WGS84",
    always_xy=True,
)

def impartial_rule(row, proj):
    x,y = proj(row["LON"], row["LAT"])
    return pd.Series({"X": x , "Y": y})

rule = partial(impartial_rule, proj=trans.transform)
My_data = My_data.merge(My_data.apply(rule, axis=1), left_index= True, right_index= True)

Hopefully this is helpful. Good luck!

Also, I would recommend reading this about Proj: https://pyproj4.github.io/pyproj/stable/gotchas.html#proj-not-a-generic-latitude-longitude-to-projection-converter