Nonlinear Regression – When is MLE Equivalent to Least Squares Regression?

least squaresmaximum likelihoodnonlinear regressionregression

I recently received this one line question in a job interview and was a little stumped by it.

In nonlinear regression, when is Maximum Likelihood Estimation equivalent to least squares?

Best Answer

By definition, the least squares estimator minimises the sum of the squared distances between the actual and predicted responses. With a set of simple steps, you can show that this estimator is equivalent to the solution of a certain maximisation problem. If we let $f$ denote the nonlinear regression function and let $\boldsymbol{\beta}$ denote the parameter of this function (and let $\sigma>0$ be an arbitrary scaling parameter), we then have:

$$\begin{align} \hat{\boldsymbol{\beta}}_\text{OLS}(\mathbf{y}, \mathbf{x}) &\equiv \underset{\boldsymbol{\beta}}{\text{arg min}} \sum_{i=1}^n (y_i - f(\mathbf{x}_i, \boldsymbol{\beta}))^2 \\[6pt] &= \underset{\boldsymbol{\beta}}{\text{arg max}} \bigg( - \sum_{i=1}^n (y_i - f(\mathbf{x}_i, \boldsymbol{\beta}))^2 \bigg) \\[6pt] &= \underset{\boldsymbol{\beta}}{\text{arg max}} \bigg( - \frac{1}{2 \sigma^2} \sum_{i=1}^n (y_i - f(\mathbf{x}_i, \boldsymbol{\beta}))^2 \bigg) \\[6pt] &= \underset{\boldsymbol{\beta}}{\text{arg max}} \ \exp \bigg( - \frac{1}{2 \sigma^2} \sum_{i=1}^n (y_i - f(\mathbf{x}_i, \boldsymbol{\beta}))^2 \bigg) \\[6pt] &= \underset{\boldsymbol{\beta}}{\text{arg max}} \ \prod_{i=1}^n \exp \bigg( - \frac{1}{2 \sigma^2} (y_i - f(\mathbf{x}_i, \boldsymbol{\beta}))^2 \bigg) \\[6pt] &= \underset{\boldsymbol{\beta}}{\text{arg max}} \ \prod_{i=1}^n \text{N} (y_i | f(\mathbf{x}_i, \boldsymbol{\beta}), \sigma^2). \\[6pt] \end{align}$$

(These steps use the fact that the $\text{arg min}$ and $\text{arg max}$ are invariant/anti-variant to strictly monotonic transformations. Look through the steps to ensure you understand why the minimising/maximising point is preserved under the steps.) The latter estimator is an MLE for a certain nonlinear regression model form --- can you see what model form this is?


Update: Per the suggestion from Dave in comments below, now that this question is a year old we can give a full solution. From the above equation we see that the MLE matches the least squares estimator when the regression model uses IID normal (Gaussian) error terms.