[Math] How to compute SSR with just residuals and Xi

statisticssums-of-squares

How do we calculate SSR? I know SSE is the square of residuals all added together, but SSR is a subtraction between prediction for each observation and the population mean. Not sure how calculate SSR. For SSE, I got 59.960.

Best Answer

You can't compute SSR from the information provided (as you noted), but you don't particularly need it to do those tasks. Note that the Breusch-Pagan test really just requires you to do OLS regression with the residuals of the original model, so the information you need is indeed all on the table. Set up a new regression model with the $e_i$ terms as a response variable to the predictor $X_i$ for your next step.

Related Solutions

[Math] Does SSTR (sum of squares for treatments) = SSR (regression sum of squares)

As Jonathan says you are indeed correct - well spotted. Basically if you have a 1-factor ANOVA with say q levels, then you have observations indexed by $j$ and $k$: $Y_{jk}$, $j=1,....,q,k=1,...,n_{q}$. The ANOVA model is $Y_{jk}=\mu_{j}+\epsilon_{jk}$, where $\mu_{j}$ represents the unknown true factor levels you want to estimate. This model can be written in "regression" notation by indexing your variables with just one index, say $i$: $Y_{i}=\sum\nolimits_{j=1}^{q}x_{ij}\beta_{j}+\epsilon_{i}$, $i=1,....,N$, where $x_{ij}=1$ for one of the $j's$ and zero otherwise. We assume that for your $N$ regression observations we have $n_{j}$ observations with $x_{ij}=1$, and that $\sum\nolimits_{j=1}^{q}=N$. Letting $\hat{Y_{i}}=\sum\nolimits_{j=1}^{q}x_{ij}\hat{\beta_{j}}$ we see that

$SSR=\sum\nolimits_{i=1}^{N}(\hat{Y_{i}}-\bar{Y})^{2}=\sum\nolimits_{i=1}^{N}(\sum\nolimits_{j=1}^{q}x_{ij}\hat{\beta_{j}}-\bar{Y})^{2}=\sum\nolimits_{j=1}^{q}n_{j}(\hat{\beta}_{j}-\bar{Y})^{2}$.

Now due to $x_{ij}$ being zero or one we find that $\hat{\beta}_{j}=\bar{Y}_{j}$ (I mean this to denote the average of the $Y$'s where $x_{ij}=1$), thus $\sum\nolimits_{i=1}^{N}(\hat{Y_{i}}-\bar{Y})^{2}=\sum\nolimits_{j=1}^{q}n_{j}(\bar{Y}_{j}-\bar{Y})^{2}$. In ANOVA notation we have $\bar{Y}_{j}=\bar{Y}_{j\cdot}$, and so

$SSR=\sum\nolimits_{i=1}^{N}(\hat{Y_{i}}-\bar{Y})^{2}=\sum\nolimits_{j=1}^{q}n_{j}(\bar{Y}_{j\cdot}-\bar{Y})^{2}=SSTR$.

Basically ANOVA is just a restricted form of regression, the restriction being the variables are factor variables rather than continuous ones. I find it much easier to learn about regression first, and to then think of ANOVAs in this way. This is because all the theory of regression carries over to ANOVA, but the theory about the sums of squares of ANOVAs only applies to these specific regression models, and not to a general one (where continuous and factor variables are present). If you have the time it is worth learning about regression as well as ANOVAs since the theory of ANOVAs gets you thinking from a designed experiment viewpoint (randomised controlled trials), whilst regression theory is more general since it really is about you already having your data (not from a designed experiment) and wanting to analyze it.

[Math] Linear regression: degrees of freedom of SST, SSR, and RSS

There are many different ways to look at degrees of freedom. I wanted to provide a rigorous answer that starts from a concrete definition of degrees of freedom for a statistical estimator as this may be useful/satisfying to some readers:

Definition: Given an observational model of the form $$y_i=r(x_i)+\xi_i,\ \ \ i=1,\dots,n$$ where $\xi_i=\mathcal{N}(0,\sigma^2)$ are i.i.d. noise terms and the $x_i$ are fixed. The degrees of freedom (DOF) of the estimator $\hat{y}$ is defined as $$\text{df}(\hat{y})=\frac{1}{\sigma^2}\sum_{i=1}^n\text{Cov}(\hat{y}_i,y_i)=\frac{1}{\sigma^2}\text{Tr}(\text{Cov}(\hat{y},y)),$$ or equivalently by Stein's lemma $$\text{df}(\hat{y})=\mathbb{E}(\text{div} \hat{y}).$$

Using this definition, let's analyze linear regression.

Linear Regression: Consider the model $$y_i=x_i\beta +\xi_i,$$ with $x_i\in\mathbb{R}^p$ are independent row vectors. In your case, $p=2$, and the $x_i={z_i,1}$ correspond to a point and the constant $1$, and $\beta=\left[\begin{array}{c} m\\ b \end{array}\right]$, that is a slope and constant term so that $x_i \beta=m z_i+b$. Then this can be rewritten as $$y=X\beta+\xi$$ where $X$ is an $n\times p$ matrix whose $i^{th}$ row is $x_i$. The least squares estimator is $\hat{\beta}^{LS}=(X^T X)^{-1}X^Ty$. Let's now based on the above definition calculate the degrees of freedom of $SST$, $SSR$, and $SSE$.

$SST:$ For this, we need to calculate $$\text{df}(y_i-\overline{y})=\frac{1}{\sigma^2}\sum_{i=1}^n\text{Cov}(y_i-\overline{y},y_i)=n-\frac{1}{\sigma^2}\sum_{i=1}^n\text{Cov}(\overline{y},y_i)=n-\frac{1}{\sigma^2}\sum_{i=1}^n \frac{\sigma^2}{n}=n-1.$$

$SSR:$ For this, we need to calculate $$\text{df}(X\hat{\beta}^{LS}-\overline{y})=\frac{1}{\sigma^2}\text{Tr}\left(\text{Cov}(X(X^TX)^{-1}X^y,y\right)-\text{df}(\overline{y})$$ $$=-1+\text{Tr}(X(X^TX)^{-1}X\text{Cov(y,y)})$$ $$=-1+\text{Tr}(X(X^TX)^{-1}X^T)$$ $$=p-1.$$ In your case $p=2$ since you will want $X$ to include the all ones vector so that there is an intercept term, and so the degrees of freedom will be $1$. However note that this will equal the number of parameters when we are doing regression with multiple parameters.

$SSE:$ $(n-1)-(p-1)=n-p$, which follows linearity of $df$.

Best Answer

Related Solutions

[Math] Does SSTR (sum of squares for treatments) = SSR (regression sum of squares)

[Math] Linear regression: degrees of freedom of SST, SSR, and RSS

Related Question