Solved – Neural Network Model Complexity Metric

deep learningmachine learningneural networks

Generally speaking, the pervasive idea (AFAIK) in predictive modeling is to use the simplest model that performs the best. This is relatively easy with some algorithms such as random forest (less variables at each split = less complex).

This is also easy at first with simple neural networks. A neural network with 5 nodes is clearly more complex than a 3 node network. But what about when we get in to multiple layers? For example, is a 100 node, single layer network considered less complex than a two layer network with 10 nodes in each layer? Or are the layers multiplicative (10*10 = 100) and we would say they are the same complexity? Naturally there are many different combinations which increase as more layers are added.

I am curious about this as I have seen discussions regarding very wide versus very deep networks and want to know if there is any consensus on what defines a 'simpler' network.

Best Answer

For accessing a complexity of a model, number of free parameters is a good start, with it you can calculate AIC or BIC from number of free parameters. And getting number of free parameters in a Multi Layer Perception (MLP) neural network can be found here: Number of parameters in an artificial neural network for AIC

In addition, there are some cases, that you have a lot parameters, but they are not "totally free" / with regularization. For example, for linear regression, if you have $1000$ features but $500$ data points, it is totally OK to fit a model with $1000$ coefficients, but regularize the coefficients with a large regularization parameter. You can search Ridge Regression or Lasso Regression for details.

In Neural network case, it is also possible people have a very compacted network structure (many layers many neurons) but with some regularization in there. In that case, the method mentioned above will not work.


Finally, I would not agree your statement about random forest. As discussed in Breiman's original paper: in creasing number of trees is will not lead a more complex model / have over fitting. Instead, the out of bag (OOB) error will converge, if you have large number of trees. In practice, if computational power is not a concern, building a random forest with large number trees is actually recommended.


To your comment:

The model complexity is an abstract concept, and can be defined in different ways. AIC and BIC are some definitions and other way of defining it exists. See this Definition of model complexity in XGBoost as an example.

In addition, it is fine, if two NN has different structure, but it is still can have same complexity. Here is an example: say, we are doing polynomial regression. You have 2 ways, one is have a higher order model with more regularization, another is lower order without regularization. You can have same "complexity" but the structure are different.