Dropout vs Drop Connect – What is the Difference Between Dropout and Drop Connect?

dropoutneural networks

What is the difference between dropout and drop connect?

AFAIK, dropout randomly drops hidden nodes during training but keeps them in testing, and drop connect drops connections.

But isn't dropping connections equivalent to dropping the hidden nodes? Aren't the nodes (or connections) just a set of weights?

Best Answer

DropOut and DropConnect are both methods intended to prevent "co-adaptation" of units in a neural network. In other words, we want units to independently extract features from their inputs instead of relying on other neurons to do so.

Suppose we have a multilayered feedforward network like this one (the topology doesn't really matter). We're worried about the yellow hidden units in the middle layer co-adapting.

sample 5-4-3 network

DropOut

To apply DropOut, we randomly select a subset of the units and clamp their output to zero, regardless of the input; this effectively removes those units from the model. A different subset of units is randomly selected every time we present a training example.

Below are two possible network configurations. On the first presentation (left), the 1st and 3rd units are disabled, but the 2nd and 3rd units have been randomly selected on a subsequent presentation. At test time, we use the complete network but rescale the weights to compensate for the fact that all of them can now become active (e.g., if you drop half of the nodes, the weights should also be halved).

DropOut examples

DropConnect

DropConnect works similarly, except that we disable individual weights (i.e., set them to zero), instead of nodes, so a node can remain partially active. Schematically, it looks like this:

DropConnect

Comparison

These methods both work because they effectively let you train several models at the same time, then average across them for testing. For example, the yellow layer has four nodes, and thus 16 possible DropOut states (all enabled, #1 disabled, #1 and #2 disabled, etc).

DropConnect is a generalization of DropOut because it produces even more possible models, since there are almost always more connections than units. However, you can get similar outcomes on an individual trial. For example, the DropConnect network on the right has effectively dropped Unit #2 since all of the incoming connections have been removed.

Further Reading

The original papers are pretty accessible and contain more details and empirical results.

Related Question