Solved – To what exactly does the term “activations” refer in neural networks

deep learningneural networksterminology

Does it refer to the input or the output of the activation function?

The literature seems to be inconsistent. A few examples:

Activations = Input of the activation function

Activations = Output of the activation function

Best Answer

The simplest representation of a neural network is a Multi-Layer Perceptron or MLP. In its simplest form, MLP is just three layers.

An input layer represented by matrix $X \in \mathbb{R}^{N\times d}$ where $N$ is the number of training examples and $d$ is the number of features.

A hidden layer which is usually a ReLU or a logistic sigmoid function. Hidden layer $i$ could be a ReLU function which is represented by $$h_i(x) = \text{ReLU}(x) = max(x, 0)$$ In other words if the input to the ReLU function is negative, the function outputs a $0$. If the inputs x are positive, the ReLU function will output $x$.

The hidden layer feeds into the output layer which is just another function. This function could be squared error function (in the context of regression) or softmax (in the case of multiclass classification). The MLP is complete when you consider the weight and bias matrices but we don't need them for now.

Th activation function is just what the name suggests... a function. In the example above, the activation function for the hidden layer is the ReLU function. The activation function for the output layer was squared error or softmax.

When someone in Machine Learning uses the word $\text{activations}$, they are almost always referring to the output of the activation function. The possible activations in the hidden layer in the example above could only either be a $0$ or a $1$.

Note that the hidden activations (output from the hidden layer) could become input to other activation functions (in this case; the output layer activation functions). Pre-activation is the input to an activation function.

On a final note, I come from a statistics background which is a much older and more developed field. The notation in Statistics is pretty much standard. In machine learning however, the notation and the nomenclature are still evolving so I would not be surprised to see some authors use some terms differently. Context is your best friend when reading machine learning texts.

Related Question