Solved – Understanding regressions – the role of the model

epidemiologylog-linearmodelingregression

How can a regression model be any use if you don't know the function you are trying to get the parameters for?

I saw a piece of research that said that mothers who breast fed their children were less likely to suffer diabetes in later life. The research was from a survey of some 1000 mothers and controlled for miscellaneous factors and a loglinear model was used.

Now does this mean that they reckon all the factors that determine the likelihood of diabetes fit in a nice function (exponential presumably) that translates neatly into a linear model with logs and that whether the woman breast fed turned out to be statistically significant?

I'm missing something I'm sure but, how the hell do they know the model?

Best Answer

It helps to view regression as a linear approximation of the true form. Suppose the true relationship is

$$y=f(x_1,...,x_k)$$

with $x_1,...,x_k$ factors explaining the $y$. Then first order Taylor approximation of $f$ around zero is:

$$f(x_1,...,x_k)=f(0,...,0)+\sum_{i=1}^{k}\frac{\partial f(0)}{\partial x_k}x_k+\varepsilon,$$

where $\varepsilon$ is the approximation error. Now denote $\alpha_0=f(0,...,0)$ and $\alpha_k=\frac{\partial{f}(0)}{\partial x_k}$ and you have a regression:

$$y=\alpha_0+\alpha_1 x_1+...+\alpha_k x_k + \varepsilon$$

So although you do not know the true relationship, if $\varepsilon$ is small you get approximation, from which you can still deduce useful conclusions.

Related Question