Solved – the difference between logistic regression and neural networks

intuitionlogisticneural networks

How do we explain the difference between logistic regression and neural network to an audience that have no background in statistics?

Best Answer

I assume you're thinking of what used to be, and perhaps still are referred to as 'multilayer perceptrons' in your question about neural networks. If so then I'd explain the whole thing in terms of flexibility about the form of the decision boundary as a function of explanatory variables. In particular, for this audience, I wouldn't mention link functions / log odds etc. Just keep with the idea that the probability of an event is being predicted on the basis of some observations.

Here's a possible sequence:

  • Make sure they know what a predicted probability is, conceptually speaking. Show it as a function of one variable in the context of some familiar data. Explain the decision context that will be shared by logistic regression and neural networks.
  • Start with logistic regression. State that it is the linear case but show the linearity of the resulting decision boundary using a heat or contour plot of the output probabilities with two explanatory variables.
  • Note that two classes may not be well-separated by the boundary they see and motivate a more flexible model to make a more curvy boundary. If necessary show some data that would be well distinguished this way. (This is why you start with 2 variables)
  • Note that you could start complicating the original linear model with extra terms, e.g. squares or other transformations, and maybe show the boundaries that these generate.
  • But then discard these, observing that you don't know in advance what the function form ought to be and you'd prefer to learn it from the data. Just as they get enthusiastic about this, note the impossibility of this in complete generality, and suggest that you are happy to assume that it should at least be 'smooth' rather than 'choppy', but otherwise determined by the data. (Assert that they were probably already thinking of only smooth boundaries, in the same way as they'd been speaking prose all their lives).
  • Show the output of a generalized additive model where the output probability is a joint function of the pair of the original variables rather than a true additive combination - this is just for demonstration purposes. Importantly, call it a smoother because that's nice and general and describes things intuitively. Demonstrate the non-linear decision boundary in the picture as before.
  • Note that this (currently anonymous) smoother has a smoothness parameter that controls how smooth it actually is, refer to this in passing as being like a prior belief about smoothness of the function turning the explanatory variables into the predicted probability. Maybe show the consequences of different smoothness settings on the decision boundary.
  • Now introduce the neural net as a diagram. Point out that the second layer is just a logistic regression model, but also point out the non-linear transformation that happens in the hidden units. Remind the audience that this is just another function from input to output that will be non-linear in its decision boundary.
  • Note that it has a lot of parameters and that some of them need to be constrained to make a smooth decision boundary - reintroduce the idea of a number that controls smoothness as the same (conceptually speaking) number that keeps the parameters tied together and away from extreme values. Also note that the more hidden units it has, the more different types of functional forms it can realise. To maintain intuition, talk about hidden units in terms of flexibility and parameter constraint in terms of smoothness (despite the mathematical sloppiness of this characterization)
  • Then surprise them by claiming since you still don't know the functional form so you want to be infinitely flexible by adding an infinite number of hidden units. Let the practical impossibility of this sink in a bit. Then observe that this limit can be taken in the mathematics, and ask (rhetorically) what such a thing would look like.
  • Answer that it would be a smoother again (a Gaussian process, as it happens; Neal, 1996, but this detail is not important), like the one they saw before. Observe that there is again a quantity that controls smoothness but no other particular parameters (integrated out, for those that care about this sort of thing).
  • Conclude that neural networks are particular, implicitly limited, implementations of ordinary smoothers, which are the non-linear, not necessarily additive extensions of the logistic regression model. Then do it the other way, concluding that logistic regression is equivalent to a neural network model or a smoother with the smoothing parameter set to 'extra extra smooth' i.e. linear.

The advantages of this approach is that you don't have to really get into any mathematical detail to give the correct idea. In fact they don't have to understand either logistic regression or neural networks already to understand the similarities and differences.

The disadvantage of the approach is that you have to make a lot of pictures, and strongly resist the temptation to drop down into the algebra to explain things.