Solved – Calculate the distribution distance between two datasets

distancedistributions

I have got two databases $X_{S}$ and $X_{T}$ which are having different number of samples but the same size of feature-space. I am wandering which is the most efficient way to calculate the distance of those databases. How can I calculate the distance between their distributions?

Best Answer

If you do know their distributions, you can use the KL-divergence to calculate the distance between their distributions. The problem is when you do not have any knowledge about their pdf s. You can perhaps have a look at this paper. Good luck