Solved – does sklearn rbm scale well with sparse high dimensional features

machine learningrestricted-boltzmann-machinescikit learn

i am using scikit learn's RBM implementation. There are two problems:

  1. The running time is O(d^2) where d is the number of features. This becomes a problem in using high dimensionality sparse features. Consider features that come from feature hashing for instance.

  2. It only allows for binary visible features. Do I have to change the sklearn code to have non binary units or there is some trick that I am unaware of?

I am expecting RBMs with 4 features to have a better fit than a mixture of 2 gaussians (that has a similar number of parameters). Has anyone seen any experiments done on RBMs for unsupervised modeling other than pretraining?

Best Answer

Training time

RBM training time depends on several parameters, most importantly from a number of iterations, number of Gibbs sampling loops and a size of weight matrix. First 2 are easy to manipulate, while the last one depends is always $m \times n$, where $m$ is a number of visible and $n$ - number of hidden units. So actual time complexity of training RBM with respect to the number of features is $O(m \times n)$.

Unfortunately, there's not much you can do with it. Unlike input data, weight matrix has to be dense to be able to sample hidden units from visible ones and vise versa. You can play around with a number of hidden units, though, but don't expect RBM being training at the same speed as SVM or logistic regression, for example.

Note, that if your data has a notion of locality (e.g. nearby pixels on an image or words in a text), you can also try out convolutional networks which have much smaller weight matrices and often produce better results.

Binary features

Yes, SciKit Learn includes only BernoulliRBM, which is RBM with binary units. However, it doesn't apply any constraints on input data. It turns out that BernoulliRBM can handle real-valued data (e.g. image data) pretty successfully, even though theoretical implications of such a hack are unclear.

If you are looking for strictly real-valued RBM and using something different than Python/SciKit Learn is an option for you, also take a look at Boltzmann.jl - my implementation of RBM in Julia heavily based on sklearn's version, but also including Gaussian RBM. More time-consuming, but helpful alternative is to port GRBM back to SciKit Learn, which basically requires only rewriting a couple of methods for sampling.

Usage of RBMs other than pretraining

RBMs are useful not only for pretraining in deep networks, but also for representation learning (feature extraction), dimensionality reduction, probability distribution estimation (and thus sampling) and many other tasks. I've successfully used them for retrieving facial expression "modes" from images of emotional faces and analysis of last.fm data. RBMs are quite popular in recommendation systems (e.g. see paper by Salakhutdinov or another by Netflix Prize winners), music retrieval and different applications of natural language processing.

Related Question