Regression Models – How to Tell the Difference Between Linear and Non-Linear

multicollinearitymultiple regressionnonlinear regressionregression

I was reading the following link on non linear regression SAS Non Linear. My understanding from reading the first section "Nonlinear Regression vs. Linear Regression" was that the equation below is actually a linear regression, is that correct? If so why?

$$y = b_1x^3 + b_2x^2 + b_3x + c$$

Am I also to understand that in non linear regression multicollinearity isn't an issue? I know that multicollinearity can be an issue in linear regression so surely if the model above is in fact a linear regression there would be multicollinearity?

Best Answer

There are (at least) three senses in which a regression can be considered "linear." To distinguish them, let's start with an extremely general regression model

$$Y = f(X,\theta,\varepsilon).$$

To keep the discussion simple, take the independent variables $X$ to be fixed and accurately measured (rather than random variables). They model $n$ observations of $p$ attributes each, giving rise to the $n$-vector of responses $Y$. Conventionally, $X$ is represented as an $n\times p$ matrix and $Y$ as a column $n$-vector. The (finite $q$-vector) $\theta$ comprises the parameters. $\varepsilon$ is a vector-valued random variable. It usually has $n$ components, but sometimes has fewer. The function $f$ is vector-valued (with $n$ components to match $Y$) and is usually assumed continuous in its last two arguments ($\theta$ and $\varepsilon$).

The archetypal example, of fitting a line to $(x,y)$ data, is the case where $X$ is a vector of numbers $(x_i,\,i=1,2,\ldots,n)$--the x-values; $Y$ is a parallel vector of $n$ numbers $(y_i)$; $\theta = (\alpha,\beta)$ gives the intercept $\alpha$ and slope $\beta$; and $\varepsilon = (\varepsilon_1,\varepsilon_2,\ldots,\varepsilon_n)$ is a vector of "random errors" whose components are independent (and usually assumed to have identical but unknown distributions of mean zero). In the preceding notation,

$$y_i = \alpha + \beta x_i +\varepsilon_i = f(X,\theta,\varepsilon)_i$$

with $\theta = (\alpha,\beta)$.

The regression function may be linear in any (or all) of its three arguments:

  • "Linear regression, or a "linear model," ordinarily means that $f$ is linear as a function of the parameters $\theta$. The SAS meaning of "nonlinear regression" is in this sense, with the added assumption that $f$ is differentiable in its second argument (the parameters). This assumption makes it easier to find solutions.

  • A "linear relationship between $X$ and $Y$" means $f$ is linear as a function of $X$.

  • A model has additive errors when $f$ is linear in $\varepsilon$. In such cases it is always assumed that $\mathbb{E}(\varepsilon) = 0$. (Otherwise, it wouldn't be right to think of $\varepsilon$ as "errors" or "deviations" from "correct" values.)

Every possible combination of these characteristics can happen and is useful. Let's survey the possibilities.

  1. A linear model of a linear relationship with additive errors. This is ordinary (multiple) regression, already exhibited above and more generally written as

    $$Y = X\theta + \varepsilon.$$

    $X$ has been augmented, if necessary, by adjoining a column of constants, and $\theta$ is a $p$-vector.

  2. A linear model of a nonlinear relationship with additive errors. This can be couched as a multiple regression by augmenting the columns of $X$ with nonlinear functions of $X$ itself. For instance,

    $$y_i = \alpha + \beta x_i^2 + \varepsilon$$

    is of this form. It is linear in $\theta=(\alpha,\beta)$; it has additive errors; and it is linear in the values $(1,x_i^2)$ even though $x_i^2$ is a nonlinear function of $x_i$.

  3. A linear model of a linear relationship with nonadditive errors. An example is multiplicative error,

    $$y_i = (\alpha + \beta x_i)\varepsilon_i.$$

    (In such cases the $\varepsilon_i$ can be interpreted as "multiplicative errors" when the location of $\varepsilon_i$ is $1$. However, the proper sense of location is not necessarily the expectation $\mathbb{E}(\varepsilon_i)$ anymore: it might be the median or the geometric mean, for instance. A similar comment about location assumptions applies, mutatis mutandis, in all other non-additive-error contexts too.)

  4. A linear model of a nonlinear relationship with nonadditive errors. E.g.,

    $$y_i = (\alpha + \beta x_i^2)\varepsilon_i.$$

  5. A nonlinear model of a linear relationship with additive errors. A nonlinear model involves combinations of its parameters that not only are nonlinear, they cannot even be linearized by re-expressing the parameters.

    • As a non-example, consider

      $$y_i = \alpha\beta + \beta^2 x_i + \varepsilon_i.$$

      By defining $\alpha^\prime = \alpha\beta$ and $\beta^\prime=\beta^2$, and restricting $\beta^\prime \ge 0$, this model can be rewritten

      $$y_i = \alpha^\prime + \beta^\prime x_i + \varepsilon_i,$$

      exhibiting it as a linear model (of a linear relationship with additive errors).

    • As an example, consider

      $$y_i = \alpha + \alpha^2 x_i + \varepsilon_i.$$

      It is impossible to find a new parameter $\alpha^\prime$, depending on $\alpha$, that will linearize this as a function of $\alpha^\prime$ (while keeping it linear in $x_i$ as well).

  6. A nonlinear model of a nonlinear relationship with additive errors.

    $$y_i = \alpha + \alpha^2 x_i^2 + \varepsilon_i.$$

  7. A nonlinear model of a linear relationship with nonadditive errors.

    $$y_i = (\alpha + \alpha^2 x_i)\varepsilon_i.$$

  8. A nonlinear model of a nonlinear relationship with nonadditive errors.

    $$y_i = (\alpha + \alpha^2 x_i^2)\varepsilon_i.$$


Although these exhibit eight distinct forms of regression, they do not constitute a classification system because some forms can be converted into others. A standard example is the conversion of a linear model with nonadditive errors (assumed to have positive support)

$$y_i = (\alpha + \beta x_i)\varepsilon_i$$

into a linear model of a nonlinear relationship with additive errors via the logarithm, $$\log(y_i) = \mu_i + \log(\alpha + \beta x_i) + (\log(\varepsilon_i) - \mu_i)$$

Here, the log geometric mean $\mu_i = \mathbb{E}\left(\log(\varepsilon_i)\right)$ has been removed from the error terms (to ensure they have zero means, as required) and incorporated into the other terms (where its value will need to be estimated). Indeed, one major reason to re-express the dependent variable $Y$ is to create a model with additive errors. Re-expression can also linearize $Y$ as a function of either (or both) of the parameters and explanatory variables.


Collinearity

Collinearity (of the column vectors in $X$) can be an issue in any form of regression. The key to understanding this is to recognize that collinearity leads to difficulties in estimating the parameters. Abstractly and quite generally, compare two models $Y = f(X,\theta,\varepsilon)$ and $Y=f(X^\prime,\theta,\varepsilon^\prime)$ where $X^\prime$ is $X$ with one column slightly changed. If this induces enormous changes in the estimates $\hat\theta$ and $\hat\theta^\prime$, then obviously we have a problem. One way in which this problem can arise is in a linear model, linear in $X$ (that is, types (1) or (5) above), where the components of $\theta$ are in one-to-one correspondence with the columns of $X$. When one column is a non-trivial linear combination of the others, the estimate of its corresponding parameter can be any real number at all. That is an extreme example of such sensitivity.

From this point of view it should be clear that collinearity is a potential problem for linear models of nonlinear relationships (regardless of the additivity of the errors) and that this generalized concept of collinearity is potentially a problem in any regression model. When you have redundant variables, you will have problems identifying some parameters.

Related Question