Solved – Modern Use Cases of Restricted Boltzmann Machines (RBM’s)

deep learninggenerative-modelsreferencesrestricted-boltzmann-machine

Background: A lot of the modern research in the past ~4 years (post alexnet) seems to have moved away from using generative pretraining for neural networks to achieve state of the art classification results.

For example, the top results for mnist here include only 2 papers of the top 50 seem to be using generative models, both of which are RBM's. The other 48 winning papers are about different discriminative feed forward architectures with much effort being put towards finding better/novel weight initializations and activation functions different from the sigmoid used in the RBM and in many older neural networks.

Question: Is there any modern reason to use Restricted Boltzmann Machines anymore?

If not, is there a de facto modification one can apply to these feed forward architectures to make any of their layers generative?

Motivation: I ask because some of the models I'm seeing available, usually variants on the RBM, don't necessarily have obvious analogous discriminative counterparts to these generative layers/models, and visa versa. For example:

  • mcRBM

  • ssRBM

  • CRBM (although one could argue the CNN used feed forward architectures is the discriminative analogous architecture)

Also, these were clearly pre alexnet as well, from 2010, 2011, and 2009 respectfully.

Best Answer

This is kind of an old question, but since it asks essentially asks for 'best practices', rather than what is actually technically possible (ie, doesnt need too much research focus), current best practices is something like:

  • RBMs are not normally used currently
  • linear models (linear regression, logistic regression) are used where possible
  • otherwise deep feed-forward networks with layers such as fully-connected layers, convolutional layers, and throwing in some kind of regularization layers, such as dropout, and lately batch-normalization
  • of course with activation layers in between, typically ReLU, but tanh and sigmoid are used too
  • and probably some max-poolings (not always: average poolings and others are used too)

For generative usages, common techniques include: