Solved – Model with non-linear transformation

data transformationfittingmodelingnonlinear regressionr

I don't understand this concept well and need help.

I was choosing whether to use a linear model or apply a non-linear transformation in my model formula. To do a diagnostic, I quickly plotted my data:

plotalldaily <- ggplot(amsd, aes(ImpressionsA, Leads.T)) + geom_point(color="orange")+geom_smooth()

enter image description here

From the plot, I guessed that a cubic polynomial transformation of my x variable should give me a better model fit. I referred this: http://www3.nd.edu/~rwilliam/stats2/l61.pdf . On Page 5, there is an explanation of polynomial models with cubic terms.

So I checked the model fit using two formulae- one with the non-linear transformation & another simple linear:

test1 <- lm(Leads.T~ImpressionsA, amsd)

test2 <- lm(Leads.T~I(ImpressionsA^3), amsd)

Strangely, the linear relationship is giving me a better model fit: Lower Standardized Error, Higher R-squared and Better Residuals Distribution.

TEST 1 Residuals

enter image description here

TEST 2 Residuals

enter image description here

I don't know what to make of it. Which model should I fit and what other kind of transformations should I try?

Best Answer

You haven't used the full form of the cubic representation and are missing two terms (i.e. have unintentionally constrained their parameters to equal zero):

$$Leads = \beta_{0} + \beta_{ImpA}ImpA + \beta_{ImpA^{2}}ImpA^{2} + \beta_{ImpA^{3}}ImpA^{3} + \varepsilon$$

Per the currently highest voted answer in Fitting polynomial model to data in R, you can do either

lm(Leads ~ ImpA + I(ImpA^2) + I(ImpA^{3}))

(as you indicate in your comment, but you had a missing parenthesis), or:

lm(Leads ~ poly(ImpA, 3, raw=TRUE))
Related Question