The coefficient of determination can be calculated in various ways which coincide for linear regression. (If that's not true, then something in the comparison is not the coefficient of determination.) Away from that case, it gets messier. Various analogues or alternatives are often (but not always) labelled pseudo-. Watch out also for adjusted relatives penalising for using several predictors.
I have found the paper of Zheng and Agresti (2000) to be helpful in this territory.
Zheng and Agresti (2000) discussed the correlation between the response and the
fitted or predicted response as a general measure of predictive power for generalized linear models (GLMs). This measure has the advantages of referring to the original scale of measurement, of being applicable to all types of GLM and of being familiar to many users of statistics. Preferably, it should be used as a comparative measure for different models applied to the same data set, given that restrictions on values of the response may imply limitations on its value (see e.g. Cox and Wermuth, 1992).
For an arbitrary GLM, this correlation is invariant under a location-scale
transformation and it is the positive square root of the average proportion
of variance explained by the predictors. However, again for an arbitrary GLM,
it need not equal the positive square root of other definitions of R-square
(e.g. Hardin and Hilbe, 2001); and it need not be monotone increasing in the
complexity of the predictors, although in practice that is common. The
correlation is necessarily sensitive to outliers.
As the predicted is a function of the observed, the correlation calculated
from a sample may be expected to be biased upwards. A jackknifed correlation
is recommended as one alternative. Zheng and Agresti provide more discussion of
this point, including other estimators and a bootstrap approach to providing
confidence intervals for the correlation and to estimating the degree of overfitting.
Cox, D.R. and N. Wermuth. 1992. A comment on the coefficient of determination
for binary responses. American Statistician 46: 1-4.
Hardin, J. and J. Hilbe. 2001 (and later editions). Generalized Linear Models and Extensions. College Station, TX: Stata Press.
Zheng, B. and A. Agresti. 2000. Summarizing the predictive power of a
generalized linear model. Statistics in Medicine 19: 1771-1781.
Note The use of binary predictors in regression need not limit $R^2$ in linear regression. A simple example is the use of one continuous predictor and one binary predictor. As in principle the data could all lie on two straight lines, a value of 1 is achievable.
What you want is called the coefficient of partial determination. The coefficient of partial determination (amount of variation explained in Y) between, for example, $Y$ and $X_2$, when $X_1$ is also in the model is
$r^2_{Y2.1} = \frac{SSR(X_2|X_1)}{SSE(X_2)}.$
Likewise, for the general case of more predictors, the coefficient of partial determination between, for example, $Y$ and $X_2$, when $X_1$ and $X_3$ are in the model is
$r^2_{Y2.13} = \frac{SSR(X_2|X_1, X_3)}{SSE(X_1,X_3)}.$
Don't expect to find a lot of information on coefficients of partial determination in a lot of stat books, as it would only exist in better books on regression like Neter, Wasserman, and Kutner, and maybe Draper and Smith. It would probably be online in a lot lectures or software manuals.
Best Answer
Total variance decomposes into the explained variance and unexplained variance (more complicated than this if you deviate from ordinary least squares, but I think you’re in that setting).
$$ \text{Total Variance}=\text{Unexplained Variance} + \text{Explained Variance} $$
These often go by other terms, total sum of squares (TSS), sum of squared residuals (SSRes) and sum of squares of the regression (SSReg).
$$ TSS = SSRes + SSReg $$
Then…
$$ R^2=\dfrac{SSReg}{TSS}=\dfrac{TSS-SSRes}{TSS}\\ =1-\dfrac{SSRes}{TSS} $$
You’re right that unexplained variance is total variance minus explained variance, but that is contained in the equation for $R^2$. If you learned a different way to calculate $R^2$, it would be equivalent to what I gave above, even if the equivalence is not obvious.
If your reference says that $R^2$ is the explained variance, they are being a bit loose with their phrasing for my taste. While $R^2$ is related to the explained variance, the two are not synonyms.