Solved – How to normalize gps coordinates for deep learning

deep learningmachine learningnormalization

I am working on a project where I have to build a deep-learning model that classifies the kind of stop a vehicle at. My dataset consists of some vehicle related data such as vehicle Id, type of vehicle and so on, and GPS related data like longitude and latitude of the vehicle along the route, standard deviation and duration.
As an initial step, I'm trying to build a model that takes all the input I have for the moment without any feature engineering. I have read that the model will converge faster if I feed normalized data to it. However, I've seen some models that are not using the normalized data (such as the famous house price prediction model). So my question is, How should I normalize coordinates!?

Best Answer

Not all models are sensitive to data normalization. For example, models with batch-norm layer have a built-in mechanism to fix activations distribution. Others are more sensitive and may even diverge just because of lack of normalization (E.g., try to train a CNN on CIFAR-10 dataset with training images, which pixels are in range $[0, 255]$).

But I'm not aware of any model that would suffer from data normalization. So even though the house prediction model (btw, which one exactly?) may not do it, the model is likely to improve if the data is normalized, and you should do it too.

GPS data has roughly these bounds: the latitude is in $[-100, 100]$, the longitude is in $[-200, 200]$. The coordinates for the populated area are much narrower, but it's not it's not a big deal to assume these wide ranges. This means that the transformation...

$$ x \mapsto \frac{x}{100}$$

... will ensure that the latitude is in $[-1, 1]$ and longitude is in $[-2, 2]$ (and very likely in $[-1, 1]$ as well), which are fairly robust ranges for deep learning. The transformation is easy (in numpy it takes just one line of code) and doesn't require you to compute the statistics from the training data.