Solved – The right way to use Machine Learning to predict latitude and longitude

geostatisticsgismachine learning

There are some simple ML techniques that can be used to easily predict latitude/longitude co-ordinates, such as predicting the latitude and longitude separately using two different models. However, I get the sense that this is a simple hack that doesn't give the best results. To quote another paper:

Most regression
methods assume either that either only one real number is to be
predicted, or if multiple real numbers are to be predicted that they
are independent. The problem of predicting a point on the surface of a
sphere is more complicated as the latitudes and longitudes involved
are not independent.

Unfortunately, the authors of the linked paper just side-step the issue by using kNN. I'd like to use supervised learning with some non-geographical inputs (strings, numbers, etc…) to predict a latitude/longitude co-ordinate, and I'd like to approach it using "best practices" rather than a simple hack. How should I go about it? Any links to any papers or blog posts would be much appreciated. Thanks!

Best Answer

The problem isn't just potential interdependence of latitudes and longitudes; it's that the scales wrap around. On a circle 359 degrees and 1 degree are quite close. A general term for this type of problem is directional statistics.

One way to start with analysis of spatial data would be to go over the CRAN Task View on that topic. That page details the many R packages available for handling spatial data, analyzing point patterns, doing spatial regression, etc. Documentation for R packages that seem related to your specific interests will typically include helpful references to related literature.

Related Question