Solved – Multiple regression and OLS. How to choose the best “non-linear” specification

least squaresmultiple regression

Let's say I have to make a multiple regression like:

$ Y_i = \beta_0 + \beta_1 x_i + \beta_2 w_i + … +\beta_3 z_i + \epsilon_i $

Then I run a Ramsey RESET test upon it and discover that my linear specification is not good. What is the best way to cope with non-linearity? I know that I could specify a log-log model, a log-lin model, or add some powers on variables, or try interaction effects.

What I don't understand by reading Verbeek and Stock – Watson is: how to choose the best non-linear specification? Should I try all of them and then take a look at Akaike's Index (or Bayesian or Hannan Quinn)? Or is there a way to understand which specification is the best?

Sorry if I wasn't clear, English is not my native language.

Thank you in advance!

Best Answer

I've become rather enamoured of late with generalized additive modelling to handle non-linearity. The gam() function from the mgcv package for R makes things very easy as it incorporates automated generalized cross-validation to avoid overfitting.

Related Question