Solved – How to compare multimodal distributions

distributionsmode

I am logging the number of visitors per hour on two websites. The resulting data are something like 57 visits between 0:00 and 1:00, 32 visits beween 1:00 and 2:00, etc. The distributions are multimodal, with spikes (modi) around 10:00, 15:00 and 21:00, that is, most visitors come before lunch, before quitting time, and before bed. Let’s assume I want to test whether these distributions differ significantly, that is, if one of the sites gets significantly more hits before lunch, while the other gets more hits in the afternoon. How would I go about that?

Yes, I know I can just plot the two histograms and see the difference, but I'd like to know how to test the hypothesis that one site gets more visitors than the other at certain times, and less at others.

Mean differences are not helpful, since there is too much information lost when I learn that the mean of one distribution is at 12:00 while the other is at 13:00.

(Why is there no multimodality tag?)

Best Answer

In general when comparing times you may want to consider using circular or directional statistics (https://en.wikipedia.org/wiki/Directional_statistics), this accounts for the fact that 1 minute before midnight and 1 minute after midnight are 2 minutes apart rather than 24*60-2 minutes apart.

For this case you don't need the circular stats, but summaries such as mean and a dispersion parameter may be more meaningful.

A common test for comparing 2 distributions (null is they are identical, alternative is that they differ in any way) is the Kolmogorov–Smirnov test (https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test).