Solved – How to obtain the standard error for a slope at a given data point, for curvilinear regression

calculusregressionstandard error

A distribution looks like this:

http://imgur.com/Yv3Cnhj

modeled by an equation
$y=1.0333x^2 – .5382x + 1.6905.$

Find the rate of change (i.e. the slope at that point of the regression equation) at point 6 (the x axis value), and give the standard error of that slope.


The data set is this:

x   y
1   1.685583341
2   0.283701371
3   12.46623075
4   18.72989511
5   25.80864106
6   37.87660867
7   50.31247602
8   59.85196297
9   77.95083301
10  99.94633518
1   1.16826204
2   5.472501855
3   7.018634811
4   18.20518892
5   21.1307888
6   33.77248541
7   46.63896027
8   63.82182222
9   84.0457413
10  99.50407873
1   2.580495548
2   6.153600293
3   10.37034361
4   16.88383006
5   28.39918421
6   32.9756888
7   46.8185963
8   63.48607854
9   80.27429267
10  101.7215886

Best Answer

The model is

$$\mathbb{E}(y) = \beta_0 + \beta_1 x + \beta_2 x^2.$$

Adding a fixed (usually small) quantity $\delta x$ to $x$ and comparing gives the difference

$$\eqalign{ \frac{\delta\,\mathbb{E}(y)}{\delta\,x} &= \frac{\beta_0 + \beta_1(x+\delta x) + \beta_2(x+\delta x)^2 - (\beta_0 + \beta_1 x + \beta_2 x^2)}{\delta x} \\ &= \beta_1 + \beta_2 (2x + \delta x). }$$

This is the first difference in $y$. For the slope itself, take the limit as $\delta x \to 0$, giving

$$\frac{d\,\mathbb{E}(y)}{d\,x} = \beta_1 + 2\beta_2 x.$$

Like the model for $y$ itself, this is a linear combination of the parameters $(\beta_0, \beta_1, \beta_2)$ (with coefficients $c_0=0,c_1=1,c_2=2x$). That is key.

Obtain estimates of the coefficients, $(\hat\beta_1, \hat\beta_2)$, in any way you like, along with their covariance matrix

$$\Sigma=\text{Cov}(\hat\beta_1,\hat\beta_2).$$

Thus, $\Sigma_{ii}$ gives the estimation variance of $\beta_i$ and $\Sigma_{12}=\Sigma_{21}$ gives their covariance. With these in hand, estimate the slope at any given $x$ as

$$\widehat{\frac{d\,\mathbb{E}(y)}{d\,x}} = \hat\beta_1 + 2\hat\beta_2 x.$$

Using the standard rules to compute variances of linear combinations, its estimation variance is

$$\operatorname{Var}\left(\widehat{\frac{d\,\mathbb{E}(y)}{d\,x}}\right) = \text{Var}(\hat\beta_1 + 2\hat\beta_2 x)= \Sigma_{11} + 4x\Sigma_{12} + 4x^2\Sigma_{22}.\tag{1}$$

Its square root is the standard error of the slope at $x$.

This easy calculation of the standard error was possible due to the key observation previously made: the estimated slope is a linear combination of the parameter estimates.

More generally, to obtain the variance of a linear combination, compute

$$\operatorname{Var}\left(c_1\hat\beta_1 + c_2\hat\beta_2\right) = c_1^2\Sigma_{11} + 2c_1c_2\Sigma_{12} + c_2^2\Sigma_{22}.\tag{1}$$

Its square root is the standard error of this linear combination of coefficients.


Estimate higher derivatives, partial derivatives (or indeed any linear combination of the coefficients) and all their variances in a multiple regression model using the same techniques: differentiate, plug in the estimated parameters, and compute the variance.


For these data, $\Sigma$ is calculated (in R) to be

            (Intercept)      x I(x^2)
(Intercept)       2.427 -0.921  0.073
x                -0.921  0.423 -0.037
I(x^2)            0.073 -0.037  0.003

Using this, I drew one thousand randomly generated tangent lines for $x=6$ (assuming a trivariate normal distribution for $(\hat\beta_0,\hat\beta_1,\hat\beta_2)$) to depict the variance of the slope. Each line was drawn with high transparency. The black bar on the figure is the cumulative effect of all thousand of those tangents. (On top of it is drawn, in red, the estimated tangent itself.) Evidently, the slope is known with some certainty: its variance (by formula $(1)$) is only $0.024591$. Since the intercept of the curve itself is much less certain (its variance is $2.427$), most of these tangents differ only in elevation, not in angle, forming the tightly collimated black bar you see.

Figure

To show what else can occur, I added independent Normal errors of standard deviation $10$ to each data point and performed the same construction for the base point $x=2$. Now the slope, being much less certain, is manifest as a spreading fan of tangents.

Figure