Mahalanobis Distance – Explaining Counterintuitive Results with the Mahalanobis Distance

distancemahalanobismatching

I encountered a strange issue when performing Mahalanobis distance matching. Let's say I have one treated unit with the following values on two variables: $T:(17, 4)$. I have two control units with values $A:(16, 3)$ and $B:(15, 3)$. Intuitively, it seems like $T$ should be closer to $A$ than to $B$, and that would be true regardless of any affine transformation of the variables. The covariance matrix for my variables (which is computed in the full sample that includes many more observations) is
$$
\Sigma = \begin{bmatrix}25 & 9\\9 & 5\end{bmatrix}
$$

When I compute the Mahalanobis distance $d(.,.)$, I find that $d(T,A) = 0.522$ and $d(T,B)=0.452$; that is, $B$ is closer to $T$ than $A$ is on the Mahalanobis distance. This doesn't make much sense to me; intuitively, what is going on here?

Some R code to play around with:

T <- c(17, 4)
A <- c(16, 3)
B <- c(15, 3)

S <- matrix(c(25, 9,
              9, 5), nrow = 2)

mahalanobis(T, A, S) |> sqrt()
## [1] 0.522233
mahalanobis(T, B, S) |> sqrt()
## [1] 0.452267

Best Answer

"Why not draw a picture?" asks @mhdadk. Why not indeed?

Here are contours of the Mahalanobis distance/Gaussian likelihood centred at T (17, 4) (open circle), and two points A: (16,3) and B(15,3). You can see the point at (15,3) is closer than that at (16,3) in this metric.

enter image description here

library(ellipse)
plot(ellipse(S,centre=T,level=0.6),type="n")
points(c(16,15,17),c(3,3,4),pch=c(19,19,1))
polygon(ellipse(S,centre=T,level=0.6),col="#AA00AA20",border=NA)
polygon(ellipse(S,centre=T,level=0.5),col="#AA00AA20",border=NA)
polygon(ellipse(S,centre=T,level=0.4),col="#AA00AA20",border=NA)
polygon(ellipse(S,centre=T,level=0.3),col="#AA00AA20",border=NA)
polygon(ellipse(S,centre=T,level=0.2),col="#AA00AA20",border=NA)
polygon(ellipse(S,centre=T,level=0.1),col="#AA00AA20",border=NA)
Related Question