Because latitude and longitude are circular coordinates, some care is needed.
A simple solution is to convert them to geocentric Cartesian coordinates. For most purposes the usual conversion from spherical to Cartesian coordinates works just fine. A highly accurate calculation is included in my post at https://gis.stackexchange.com/a/34534/664; the key code is this:
ellipsoidToCartesian[{lon_, lat_}, {a_,b_}] :=
{a Cos[lat] Cos[lon], a Cos[lat] Sin[lon], b Sin[lat]};
cartesianToEllipsoid[{x_, y_, z_}, {a_,b_}] :=
{ArcTan[x, y], ArcTan[Norm[{x, y}]/a, z/b]};
(This is written in Mathematica. It serves as pseudocode for implementation in other environments, but pay attention to the order of arguments to ArcTan
.)
The values of a
and b
are the planet's semi-axes. For modern Earth coordinate systems, such as WGS84, $a = 6\,378\,137.0$ and $b \approx 6\,356\,752.314\,245$ meters. When adopting a spherical approximation, use the Authalic radius of $6\,371\,007.2$ meters--but feel free to rescale this radius if you wish to adjust the relative weight of your coordinates within the overall analysis.
If you also have height or depth data coordinates relative to the planet's surface, refer to that post for details.
If you normalize the features, dot product is the same as cosine similarity. As for the general answer, there is usually no single approach that you should always use. Usually, the choice is an empirical one: you try different ones and compare the results. Sometimes, in practice, the choice is arbitrary: assuming that different metrics are quite similar and give very similar results, you just pick one.
Best Answer
I think it's still very much an open question of which distance metrics to use for word2vec when defining "similar" words. Cosine similarity is quite nice because it implicitly assumes our word vectors are normalized so that they all sit on the unit ball, in which case it's a natural distance (the angle) between any two. As well, words that are similar tend to have vectors be close to each-other, especially in length, which means that their magnitudes are comparable, and so again cosine distance becomes natural.
In reality this is much more complex, because word2vec does not explicitely require that the embedding vectors all have length 1. Indeed there is work that shows that there is important information hidden in the lengths of vectors, so that L2 distance can be used. See here for example:
https://arxiv.org/pdf/1508.02297v1.pdf