Solved – What’s the difference between a neural network architecture and a neural network model

definitionmachine learningneural networks

I want to know what's the difference betweens the terms architecture and model in the context of neural networks?

I think the architecture is the whole definition of the neural net including the hyper-parameters($H$) (e.g. learning rate, dropout, batch size, etc), the parameters($\theta$) (e.g. weight and biases), the feature extraction process, the shape(meaning depth, how many units are and how they are connected) without any specific values for the parameters($\theta$) and hyper-parameters($H$). So it is something like $F(f(x;\theta); H)$ and $F$ is the learning algorithm that help finding good values for $\theta$ that defines $f$.

On the other side I think a neural network model is the same but with specific values for hyper-parameters (like learning rate=0.8, etc) BUT without the weights, and biases yet defined, like having the family of functions $f(x;\theta)$

Or is the model just the function $f$ ? in $y = f(x)$ with all the parameters $\theta$ already defined.

How wrong am I ?

Best Answer

This terminology is often abused, but the pedantic view is:

  • A model would be a network architecture with all it's weights viewed as free parameters.
  • A fit model is a network with fixed weights determined by running a fitting algorithm with some training data.
  • Parameters map out the various specific shapes that the model can obtain, fitting chooses specific values of the weights that best reflect the training data.
  • Hyperparameters control the behaviour of the fitting algorithm, they are often set to find the parameters that offer the best performance according to some estimate of hold-out error.

I settled on this terminology after reading Wasserman's All of Statistics.

It's very common to call the fit model just a model. I try to use my words precisely and consistently, especially when talking to students, but it is hard to avoid sometimes!

Related Question