Solved – “Vanilla” neural network model (Hastie et al.)

neural networksself-study

Hastie et al. (The elements of statistical learning) discuss "vanilla" neural network model on page 392. They describe the model as:

Derived features $Z_m$ are created from linear combinations of the inputs, and then the target $Y_k$ is modeled as a function of linear combinations of the $Z_m$:
\begin{align}
Z_m &= \sigma (\alpha_{0m} + \alpha_m^T X), \quad m = 1, \dotsc, M \\
T_k &= \beta_{0k} + \beta_k^T Z, \quad k = 1, \dotsc, K \\
f_k (X) &= g_k (T), \quad k = 1, \dotsc, K
\end{align}
where $Z = (Z_1, \dotsc, Z_M)$ and $T = (T_1, \dotsc, T_K)$

My questions

  1. What is $T_k$?
  2. What is the proper name for functions $f_k$ and $g_k$

Best Answer

First, you should carefully handle the terminology. Hastie et al. call this model a "single hidden-layer neural network". With the words you were using for it -- missing hidden-layer and using perceptron (without the additional word "multilayer") --the model could be confused with a standard perceptron, especially when you don't further specify the model.

The single hidden-layer neural network model you specified has $M$ hidden nodes. $X$ denotes the input features, which inside a node are summed up using the corresponding weights and then subjected to a sigmoid function to produce the hidden-layer outputs $Z_m$. These are used as input to the output layer, which consists of $K$ nodes (as there are $K$ classes). Again, following the usual neural network approach, in the $k$-th node the inputs are summed up and the function $g_k$ is applied in order to produce the neural network output. $g_k$ therefore can be denoted the output-layer activation function.

Finally, $f_k$ stands for the whole neural network model leading to the output of the $k$-th node. The vector of all $K$ functions $f_k$ then makes up the whole neural network.

To summarize, here are my suggestions for the names:
$T_k:$ output layer aggregation result
$g_k:$ output layer activation function
$f_k:$ single hidden-layer neural network corresponding to the $k$-th node