Solved – Dropout: scaling the activation versus inverting the dropout

deep learningdropoutneural networks

When applying dropout in artificial neural networks, one needs to compensate for the fact that at training time a portion of the neurons were deactivated. To do so, there exist two common strategies:

scaling the activation at test time
inverting the dropout during the training phase

The two strategies are summarized in the slides below, taken from Standford CS231n: Convolutional Neural Networks for Visual Recognition.

Which strategy is preferable, and why?

Scaling the activation at test time:

Inverting the dropout during the training phase:

Best Answer

Andrew made very good explanation in his Deep Learning course on this session Dropout Regularization:

Inverted dropout is more common because it makes the testing much easier
The purpose of the inverting is to assure that the Z value will not be impacted by the reduce of W.

Say a3 = a3 / keep_prob at the last step of implementation:

Z^[4] = W^[4] * a^[3] + b^[4] , the element size of a^[3] has been reduced by keep_prob from D3(a percentage of elements have been dropped out by D3), thus the value of Z^[4] is also gonna be reduced, so to compensate this roughly we shall invert the change by dividing keep_prob to make sure the value of Z^[4] will not be impacted.

Related Solutions

Convolution Neural Network – Comprehensive Guide to Training a Convolution Neural Network

You need to first calculate all your updates as if the wieghts weren't shared, but just store them, don't actually do any updating yet.

Let $w_k$ be some weight that appears at locations $I_k = \{(i,j) \colon w_{i,j} = w_k\}$ in your network and $\Delta w_{i,j} = -\eta \frac{\partial J}{\partial w_{i,j}} $ where $\eta$ is the learning rate and $J$ is your objective function. Note that at this point if you didn't have weight sharing you would just upade $w_{i,j}$ as $$ w_{i,j} = w_{i,j} + \Delta w_{i,j}. $$ To deal with the shared weights you need to sum up all the individual updates. So set $$ \Delta w_k = \sum_{(i,j) \in I_k} \Delta w_{i,j} $$ and then update $$ w_k = w_k + \Delta w_k. $$

Solved – Neuron vs. unit in a neural network

Let me suggest one scenario (the only one I can think of) where it might be useful to distinguish between "units" (or some similarly generic term) and "neurons." Biologically, a neuron is easy to identify, because it represents a single cell. In terms of neural nets, a neuron or "unit" has typically represented a single object, usually with one activation value, plus an additional threshold or separate input and output values in some cases. Problems arise in distinguishing between a neuron and a "unit" when we take into account the fact that the inputs, outputs, activations and thresholds of biological neurons are often mediated by multiple neurotransmitters and specific subsets of connections on the dendrites - many of which can be modeled as separate units. Then the line between "neuron" and "unit" blurs quickly. As William F. Allman puts it in pp. 65-66, Apprentices of Wonder: Inside the Neural Network Revolution (1989, Bantam Books: New York):

"An axon may release various amounts of transmitter; a receiving dendrite might have varying amounts of receptor; the transmitter itself may have different checmical properties and react at different rates. And the whole process may be mitigated by the action of various enzymes.”

Here's a more thorough treatment from Daniel Gardner (1993, The Neurobiology of Neural Networks, MIT Press: Cambridge, Mass.) (I lost the page number to this, so I can't provide an exact citation):

" First, it has become evident that neurons (both in vertebrates and invertebrates) possess rich and complex intrinsic properties. Most neurons have multiple channels to different ionic species, and these channels can be regulated in a wide variety of ways: They can be turned on or off by voltage, molecules, or ions. Some of these channels can be active in the absence of external inputs to the cell, and endow it with a variety of dynamic properties, such as the ability to oscillate (Llinas 1988; Selverston 1988; Yamada et al. 1989). Thus, it is not enough to specify the inputs to a neuron to predict its outputs; its internal state will also determine its behavior. As a consequence, neurons may be better represented as nonlinear dynamic systems in their own right. For example, the intrinsic conductances of thalamic neurons can allow them to act as linear input/output devices, relaying information directly to cortex, but when they are hyperpolarized, these conductances cause the neurons to burst, significantly transforming their inputs (Llinas 1988). In terms of the model neurons that have often been used in artificial neural networks, the input/output relationship would need to be represented as a function both of voltage and of time. "Second, the interactions between neurons are complex. The differential distribution of synapses on complex dendritic trees of neurons can significantly affect the nature and intensity of their inputs to a neuron. In addition, synapses may have multiple time courses (e.g., initial excitation, slower inhibition, and still slower excitation [Getting and Dekin 1985), and connections may be dynamically reconfigured (e.g., by inhibition of specific neurons [Getting and Dekin 1985, or by the actions of neuromodulators [Harris-Warrick and Marder 1991; Marder and Hooper 19851). Receptors controlling the synaptic response may be gated both by the presence of a chemical, such as a neurotransmitter, and by voltage, so that the synaptic connections between neurons can be affected by their own activity and by the activity of neurons impinging on them (discussions of these and other complexities in synaptic interactions are found in chapters 2, 3, and 4). Influences may occur over a variety of spatial and temporal scales: A neuromodulator which is only slowly broken down may affect a very large number of neurons in its vicinity over an appreciable period of time as it diffuses away from its point of release. Furthermore, such compounds are likely to selectively activate those subgroups of neurons that have a receptor for that substance. Neuromodulators may also have subtle but profound effects on the intrinsic properties of neurons, activating or inactivating intrinsic currents and thus changing their "electrical personality." Field potentials may alter the excitability of neurons in different regions of the brain (Nunez 1981)."

I've run across other such quotes in the literature with similar detail, but those two should get the point across (Gardner's book may be a good starting point if you want to look into the matter further). In cases where we're dealing with multiple activations, thresholds and the like, it might be helpful to make a distinction between "neurons" and constituent "units" that contribute their own activations and other calculations; there's such bewildering complexity to these matters that I don't think anyone can give a definitive answer as to the best way to model such distinctions. I ran into this problem when trying to implement Fukushima's neocognitrons, in which each neuron has its own separate inhibitory and stimulatory inputs; first I tried modeling them as separate neurons, then as a single neuron with multiple outputs, but I'm still not certain what the optimal choice is. There may be solid computational advantages to modeling many of these various enzymes, neurotransmitters and receptors beyond mere biological plausibility; perhaps there's not; the whole topic is still far afield, even for neuroscientists, who still have much to learn about the purposes of such connections. I suspect such questions will become far more complex and pressing in the future once the field of neuroscience advances, enabling neural net researchers to mimic more of these internal calculations. For the time being it's safe to equate neurons with "units," but that might not be the case once more sophisticated neural nets begin to make practical use of this dizzying array of computations.

Best Answer

Related Solutions

Convolution Neural Network – Comprehensive Guide to Training a Convolution Neural Network

Solved – Neuron vs. unit in a neural network

Related Question