Solved – How to calculate burstiness

text mining

I would like to have some advice on the way to calculate burstiness. I am working with a set of text data, where every term is calculated with their frequency in newspaper for 2 weeks, e.g. "apple" during their iphone4s release will be day1 = 10, day2 = 300, day3 = 25, and so on till day14. So if popular term such as Obama which appear day1 = 100, day2=100, day3=105 … since it is popular but some other terms which is bursty and spiky will be like "apple" example.

Is there any way that we can measure such burstiness, I am aware of standard deviation, and is there any other ways? The ultimate goal is to make burstiness(Obama) as small as possible and the burstiness(apple) as high as possible. Thanks.

Best Answer

Try Index of Dispersion, i.e. variance/mean ($D = \sigma^2 / \mu $). It is also known as Fano factor (for your case - with the time window $W=1\text{ day}$).

For Poisson distribution it is just $D=1$ (e.g. if people write post all the time, not related to each other).

For bursty events it is $D>1$ (e.g. there is hype about a product by Apple).

For 'self-avoiding' events $D<1$ (e.g. daily reports; if each day there is the same number of posts, it is just $D=0$).

Related Question