Solved – How to compare if two multinomial distributions are significantly different

anovachi-squared-testhypothesis testingkullback-leiblerstatistical significance

We can use T test to check if two proportions are significantly different.
Similarly is there a way to test if two multinomial distributions or "2 samples with more than 2 unique values" are significantly different from each other.

For example, I have a sample (say sample 1) where it has 100 red balls, 300 green balls and 400 yellow balls and 200 orange balls and sample 2 has 101 red balls, 302 green balls and 399 yellow and 202 orange balls.

  1. Is there a way to check if the above 2 samples are significantly different ( 2. Is this same as checking if 2 multinomial distributions are significantly different ). If so, can you explain how.

I was told in one of interviews that (if I remember correctly) KL divergence can be used to check this. 3. Can I use KL divergence for this (or to check if the sample multinomial distribution is significantly different from expected) ? 4. If so, how to check for significance with KL divergence or what's cutoff value of KLD to say that the difference is significant (like the p values in statistical tests). 5. Can I use ANOVA, chi square for these (if so, can you please explain)

Best Answer

You can perform the goodness of fit test. Given two vectors of data you test, through the chi-squared test, if they are significantly different or, given a vector of data, you test if their frequencies significantly differ from a given vector of probabilities.

Data comparison:

x1 = c(100, 300, 400, 200)
x2 = c(101, 302, 399, 202)
chisq.test(x=x1,y=x2)

Frequency comparison:

x1 = c(100, 300, 400, 200)
p = x2/(sum(x2))
chisq.test(x=x1,p=p)
OR
x2 = c(100, 300, 400, 200)
p = x1/(sum(x1))
chisq.test(x=x2,p=p)