Solved – What are the predictor variables in a neural network

definitionmachine learningneural networksrandom variableterminology

In a linear regression model, the predictor or independent (random) variables or regressors are often denoted by $X$. The related Wikipedia article does, IMHO, a good job at introducing linear regression.

In machine learning, people often talk about neural networks (and other models), but they rarely talk about terms such as random variable, (in)dependent variable, etc. In the context of NNs, what are the independent and dependent variables? Are they the "features" of the input dataset? Can the hyper-parameters (e.g. learning rate of the optimiser) also be considered independent variables? What are the dependent variables? The output of the network?

I assume that independent variable is a synonym for predictor, regressor (in the case of linear regression), etc.

Best Answer

In the context of NNs, what are the independent and dependent variables?

I don't think that anything about a neural network is easier to understand by using the "independent variable" and "dependent variable" terminology. Recurrent networks produce outputs for time $t$ which are then taken as inputs for time $t+1$. Auto-encoder neural networks take an input, and produce (1) an abstraction of that input and (2) a reconstruction of the input based on the abstraction. In either case, I don't think "independent" and "dependent" variables are really descriptive.

It's easier to think about neural networks in terms of inputs and outputs. A neural network takes something (a sequence of integer indices, a vector, an image, several vectors concatenated into a matrix, a graph) and returns something else (a probability vector, an arbitrary vector, another image).

And of course there are some neural networks that can take multiple, heterogenous inputs and return one or more outputs (which may, likewise, be heterogenous).

This description is incredibly abstract and general. That's kind of the point: neural network researchers have generalized beyond what is possible with linear regression.

Are they the "features" of the input dataset?

What a "feature" is depends on context. Some recent successes in neural networks are entirely featureless, in the sense that they take some raw input, such as an entire image, as the input. This is in contrast to other computer vision or imaging tasks which take an image, extract features, and then pass the features to some downstream task. Indeed, until CNNs started putting up major successes, the image to feature extraction to conventional machine learning (SVM, RF, etc.). pipeline was the standard practice, and much attention was devoted to developing better feature extraction methods.

Other applications of neural networks are exactly like linear regression with extra bits added on: matrix input, scalar output. The only subtlety is the hidden layer nonlinearity.

Can the hyper-parameters (e.g. learning rate of the optimiser) also be considered independent variables?

Hyperparameters are not independent variables. Let's return to the regression context. One hyperparameter of a ridge regression is the penalty on the $L^2$ norm of the coefficients. This is not an independent variable because it is not an attribute of one of the samples in your data collection; instead, it's a researcher-chosen value which controls the length of the norm of the coefficients.

Likewise, neural network hyperparameters are not independent variables. Hyperparameters, including the $L^2$ penalty and learning rate, don't describe anything about your data set. The learning rate is a direct consequence of using an iterative optimization procedure. The magnitude of the $L^2$ penalty reflects a particular choice about how to constrain the model.

This thread may also be useful. What *is* an Artificial Neural Network?