Solved – the difference between side and prior information

bayesianmachine learning

Both of these terms pop up a lot in papers involving Bayesian statistics – are they the same?

E.g.

One approach to style and content separation is to guide a factor model (e.g. PCA, Factor Analysis, ICA) by giving it “side-information” related to the structure of the data. Tenenbaum and Freeman (2000) considered the problem of extracting exactly two types of factors, namely style and content, using a bilinear model.

From here.

Best Answer

In some collaborative filter settings (where I've heard the term "side information" before), as well as I think in this case though I'm not entirely sure what the author means, "side information" is used to denote something that's a little different from priors. It's not necessarily a precisely-defined term, but here's an example.

In the Netflix Prize, researchers were challenged to improved Netflix's rating algorithm. One very common way to view the problem was: consider the giant matrix where users represent rows, columns represent movies, and entries are the user's opinions of those movies. This theoretical matrix contains what a user would think of any given movie; we have noisy estimates of some of the terms in the form of ratings that users gave on the site. A lot of work focused on estimating ratings using only this information, with low-rank matrix completion.

But then some people thought about how to add some more information to the problem: knowing the genres of movies, the actors in it, and so on can be side information for the movies; demographic information about users is side information for users. So this is information that doesn't quite fit into the general observation framework, but is additional information that you have "on the side" that can be useful in your modeling. Prior information, by contrast, can't refer to specific observations ("Joe is 35 years old").

In a quick google search, this seems to be the most common usage of the term (e.g. Bayesian Matrix Factorization with Side Information and Dirichlet Process Mixtures [AAAI 2010], Distance Metric Learning, with Application to Clustering with Side-Information [NIPS 2002], etc).