Think of this example:
Collect a dataset based on the coins in peoples pockets, the y variable/response is the total value of the coins, the variable x1 is the total number of coins and x2 is the number of coins that are not quarters (or whatever the largest value of the common coins are for the local).
It is easy to see that the regression with either x1 or x2 would give a positive slope, but when incuding both in the model the slope on x2 would go negative since increasing the number of smaller coins without increasing the total number of coins would mean replacing large coins with smaller ones and reducing the overall value (y).
The same thing can happen any time you have correlalted x variables, the signs can easily be opposite between when a term is by itself and in the presence of others.
What you are describing is a variant of stepwise model building, which, whether based on the p-values of individual predictors, or on measures of overall model performance like $R^{2}$ or AIC results in a host of problems rendering inference from such models suspect:
- deflated p values
- inflated overall model performance values
- inflated coefficients
- inflated F statistics for the whole model
- highly probable exclusion of true predictors
- highly probable inclusion of false predictors
Most of these problems arise because you are neither accounting for nor reporting the string of invisible "conditional on all these previous rejection decisions" at each step of the model building process.
So how to build a model if not by a stepwise approach? By theoretically justifying your model variables a priori and embrace reporting negative effects for a given model (i.e. don't just report coefficients with p-values below your significance threshold).
References
Babyak, M. A. (2004). What you see may not be what you get: A brief, nontechnical introduction to overfitting in regression-type models. Psychosomatic Medicine, 66:411–421.
Henderson, D. A. and Denison, D. R. (1989). Stepwise regression in social and psychological research. Psychological Reports, 64:251–257.
Huberty, C. J. (1989). Problems with stepwise methods—better alternatives. Advances in Social Science Methodology, 1:43–70.
Hurvich, C. M. and Tsai, C.-L. (1990). The impact of model selection on inference in linear regression. The American Statistician, 44(3):214–217.
Lovell, M. C. (1983). Data mining. The Review of Economics and Statistics, 65(1):1–12.
Malek, M. H. and Coburn, D. E. B. J. W. (2007). On the inappropriateness of stepwise regression analysis for model building and testing. European Journal of Applied Physiology, 101(2):263–264.
McIntyre, S. H., Montgomery, D. B., Srinivasan, V., and Weitz, B. A. (1983). Evaluating the statistical significance of models developed by stepwise regression. Journal of Marketing Research, 20(1):1–11.
Pope, P. T. and Webster, J. T. (1972). The use of an $F$-statistic in stepwise regression procedures. Technometrics, 14(2):327–340.
Rencher, A. C. and Pun, F. C. (1980). Inflation of R$^2$ in best subset regression. Technometrics, 22(1):49–53.
Romano, J. P. and Wolf, M. (2005). Stepwise multiple testing as formalized data snooping. Econometrica, 73(4):1237–1282.
Sribney, B., Harrell, F., and Conroy, R. (2011). Problems with stepwise regression.
Steyerberg, E. W., Eijkemans, M. J., and Habbema, J. D. F. (1999). Stepwise selection in small data sets: a simulation study of bias in logistic regression analysis. Journal of clinical epidemiology, 52(10):935–942.
Thompson, B. (1995). Stepwise regression and stepwise discriminant analysis need not apply here: A guidelines editorial. Educational and Psychological Measurement, 55(4):525–534.
Wilkinson, L. (1979). Tests of significance in stepwise regression. Psychological Bulletin, 86(1):168–174.
Best Answer
Yes, you're trying to calculate the Extra Sum of Squares. In short you are partitioning the regression sum of squares. Assume we have two $X$ variables, $X_1$ and $X_2$. The $SSTO$ (total sum of squares, made up of the SSR and SSE) is the same regardless of how many $X$ variables we have. Denote the $SSR$ and $SSE$ to indicate which $X$ variables are in the model: e.g.
$SSR(X_1,X_2) = 385$ and $SSE(X_1,X_2) = 110$
Now let's assume we did the regression just on $X_1$ e.g.
$SSR(X_1) = 352$ and $SSE(X_1) = 143$.
The (marginal) increase in the regression sum of squares in $X_2$ given that $X_1$ is already in the model is:
\begin{eqnarray} SSR(X_2|X_1)& = &SSR(X_1,X_2) - SSR(X_1)\\ & = & 385 - 352\\ & = & 33 \end{eqnarray}
or equivalently, the extra reduction in the error sum of squares associated with $X_2$ given that $X_1$ is already in the model is:
\begin{eqnarray} SSR(X_2|X_1) & = & SSE(X_1) - SSE(X_2,X_1)\\ &=& 143 - 110\\ &=& 33 \end{eqnarray}
In the same way we can find:
\begin{eqnarray} SSR(X_1|X_2) &=& SSE(X_2) - SSE(X_1,X_2)\\ &=& SSR(X_1,X_2) - SSR(X_2) \end{eqnarray}
Of course, this also works for more $X$ variables as well.