Solved – Link Anomaly Detection in Temporal Network

change pointmachine learningoutlierspythontime series

I came across this paper that uses link anomaly detection to predict trending topics, and I found it incredibly intriguing: The paper is "Discovering Emerging Topics in Social Streams via Link Anomaly Detection".

I would love to replicate it on a different data set, but I'm not familiar enough with the methods to know how to use them. Let's say I have a series of snapshots of network of nodes across a period of six months. The nodes have a long-tailed degree distribution, with most having only a few connections, but some having a great many. New nodes appear within this time period.

How could I implement sequentially discounted normalized maximum likelihood calculations used in the paper to detect anomalous links that I think might be precursors to a burst? Are there other methods that would be more appropriate?

I ask both theoretically and practically. If someone could point me to a way to implement this in python or R, that would be very helpful.

Anyone? I know you smart folks out there have some starting thoughts for an answer,

Best Answer

You should first come up with your definition of anomaly-score for a new node(see section 3.1, 3.2). Fortunately, the correspondence between a new post(in their case) and a new node(in your case) is almost one-to-one, since we are only interested in the set of nodes(users) that the node(post) is related to.

Thus, we can characterise a new node by the number of edges/connections k it has, and the set V of the other nodes it is connected to. Therefore, equations (1)-(4) could be written in a similar fashion. Then, you could use the Chinese Restaurant process, as described at the end of subsection 3.1., after introducing a new parameter $\gamma$. Now, given that you have obtained the probabilities (3), you can obtain the link-anomaly score (7).

Ask further, if you have difficulties to follow the steps described in subsection 3.4., where SDNML is applied.

Related Question