Solved – What’s the difference between a neural network architecture and a neural network model

I want to know what's the difference betweens the terms architecture and model in the context of neural networks?

I think the architecture is the whole definition of the neural net including the hyper-parameters($H$) (e.g. learning rate, dropout, batch size, etc), the parameters($\theta$) (e.g. weight and biases), the feature extraction process, the shape(meaning depth, how many units are and how they are connected) without any specific values for the parameters($\theta$) and hyper-parameters($H$). So it is something like $F(f(x;\theta); H)$ and $F$ is the learning algorithm that help finding good values for $\theta$ that defines $f$.

On the other side I think a neural network model is the same but with specific values for hyper-parameters (like learning rate=0.8, etc) BUT without the weights, and biases yet defined, like having the family of functions $f(x;\theta)$

Or is the model just the function $f$ ? in $y = f(x)$ with all the parameters $\theta$ already defined.

How wrong am I ?

Best Answer

This terminology is often abused, but the pedantic view is:

A model would be a network architecture with all it's weights viewed as free parameters.
A fit model is a network with fixed weights determined by running a fitting algorithm with some training data.
Parameters map out the various specific shapes that the model can obtain, fitting chooses specific values of the weights that best reflect the training data.
Hyperparameters control the behaviour of the fitting algorithm, they are often set to find the parameters that offer the best performance according to some estimate of hold-out error.

I settled on this terminology after reading Wasserman's All of Statistics.

It's very common to call the fit model just a model. I try to use my words precisely and consistently, especially when talking to students, but it is hard to avoid sometimes!

Best Answer

Related Solutions

Solved – Neural Network – Success after changing weights initialization strategy. What’s the explanation

Solved – Approximate the sine function with shallow neural network

Related Question