Solved – Measuring share contribution of each var/cov term to the standard deviation of a sum of variables

covariance-matrixstandard deviationsum

Say, for a simple example, I have a random variable $X = \alpha_1 X_1 + \alpha_2 X_2$, where $X_i$ are random variables and $\alpha_i$ are weights. I then calculate the standard deviation of $X$ as $\sigma_X = \sqrt{(\alpha_1)^2 \mathrm{Var}(X_1) + (\alpha_2)^2 \mathrm{Var}(X_2) + 2\alpha_1 \alpha_2 \mathrm{Cov}(X_1,X_2)}$.

What I'm wondering is: what is a good way to calculate the "share" of each of the three individual variance/covariance terms in the total standard deviation of $X$?

It seems like, because of the concavity of the square root function, e.g. $(\alpha_1)^2 \mathrm{Var}(X_1) / \mathrm{Var}(X)$ would not be a good, representative measure of the variance of $\alpha_1 X_1$'s "share contribution" to $X$'s standard deviation; and neither would the share of the square root of each variance/covariance term in the sum of these individual square roots, not the least because I would run into issues trying to take the square root of a negative covariance term.

The motivation for my question is as follows: I'm measuring sampling error in survey-based economic indexes (e.g. the Institute for Supply Management's PMI). For composite indexes (i.e. indexes that are calculated from responses to multiple questions, as weighted sums–the PMI is an example of this), both variance and covariance terms enter into the sampling error formula, as in the example above. I want to look at the time series of the "contributions" to the sampling error of the composite index of each of these individual variance and covariance terms. I want to do this to examine how both dispersion in responses to different questions and co-variation in responses to different questions (a) vary with economic conditions and (b) affect uncertainty in the composite index.

Does this question make sense? If so, do you have any ideas or suggestions?

Best Answer

Here's my current best guess, in case someone later has this same question.

Make the problem more general in the following way: say you are looking at the expression $x = \sqrt{x_1 + x_2 + \ldots + x_n}$ and want to figure out the "share contribution" of $x_i$ in $x$ (the $x_i$'s are simply the variance and covariance terms in the expression in the question). Suppose we have reason to believe that $\sum x_i \geq 0$ (as in variance expression in the question).

The concavity of the square root makes this kind of difficult. So too does the fact that some $x_i$ could be negative. But, let's look at how this might work for a simple example with just two terms: $x = \sqrt{x_1 + x_2}$. Say $x_1 = 36$ and $x_2 = 64$, so that $x = \sqrt{100} = 10$. We might say that $x_1$ contributes $(36/100)\cdot 10 = 3.6$ and $x_2$ contributes $(64/100)\cdot 10 = 6.4$ to this figure. Or, we might instead say $x_1$ contributes $(\sqrt{36}/(\sqrt{36}+\sqrt{64}))\cdot 10 \approx 4.29$ and $x_2$ contributes $\approx 5.71$ to this figure.

One problem with the first method is, of course, that it understates the contributions of smaller terms. By itself, $\sqrt{x_1}$ is $6$, only $25\%$ lower than $\sqrt{x_2} = 8$; while its calculated contribution is almost $45\%$ lower than that of $x_2$. This doesn't seem right. On the other hand, at least the denominator ($x_1+x_2$) has some natural connection to $x = \sqrt{x_1+x_2}$; while the denominator in the second method, $\sqrt{x_1}+\sqrt{x_2}$, is hardly natural.

I'm going to propose a third method and claim that it is better than these first two, though I'm having trouble supporting the claim; it just seems right to me.

Here's the third method, with the simple example from above. Look at $\min\{x_1,x_2\} = x_1$. Both $x_1$ and $x_2$ contribute the amount $x_1$ to the sum inside the square root; so, we should figure out how much of $x$ is contributed by the first $x_1$ amount of both terms, together, and then divide it evenly between $x_1$ and $x_2$; and this amount will eventually be included in the total contributed by each term (in fact, it will represent all of the contribution from $x_1$. To do this, we simply compute $s_1 = \frac{1}{2}\cdot \sqrt{x_1+x_1} = sqrt{72}/2 \approx 4.24$. So, $x_1$'s share contribution to $x$ is $.424$; while $x_2$'s is $.424$ plus something else. Here's how we compute that "something else": we figure out the additional amount contributed by all terms of size at least $x_2$ (in this case, just $x_2$), i.e. compute $\sqrt{x_1 + x_2} - sqrt{x_1 + x_1}$, and add this amount to the $.424$ contributed by the first $x_1$ amount of $x_2$. In numbers, this is $\sqrt{100}-sqrt{72} \approx 10 - 8.49 = 1.51$. So, $x_2$ contributes $\approx 4.24 + 1.51 = 5.75$ to $x$, so that its share contribution is $\approx .575$.

Now I'll write out general formulas for the case with $x = \sqrt{x_1 + \ldots + x_n}$. Reorder the $x_i$ so that $x_1 \leq x_2 \leq \ldots \leq x_n$. Then the contribution of $x_1$ is $s_1 = \frac{1}{n}\cdot\sqrt{n x_1}$; the contribution of $x_2$ is $s_2 = s_1 + \frac{1}{n-1}\cdot(\sqrt{(n-1)x_2 + x_1}-\sqrt{n x_1})$; of $x_3$ is $s_3 = s_1 + s_2 + \frac{1}{n-2}\cdot(\sqrt{(n-2) x_3 + x_2 + x_1}-\sqrt{(n-1)x_2 + x_1})$; and so on. The shares are obtained simply by dividing $s_i$ by $x$ (that the shares add up to $1$ is straightforward to verify).

As long as $\sum x_i \geq 0$, we can account for negative $x_i$ simply by taking the negative of the square root of the absolute value of the expression, as long as the expression inside the square root is negative. In computer code, for example, this would be achieved by replacing sqrt(...) with (real(sqrt(...))-imag(sqrt(...))).

If anyone reads this and has any thoughts (it sucks, it makes sense, have a way to explain why it makes sense because I'm kind of failing at that), please share; I'd really appreciate it.

Analysis

Let the data for the explanatory variables be arranged as $n$-dimensional column vectors $x_1, x_2, \ldots, x_p$ with covariance matrix $C_X$ and the response variable be the column vector $y$, considered to be a realization of a random variable $Y$. The ordinary least squares estimates $\hat\beta$ of the coefficients in the model

$$\mathbb{E}(Y) = \alpha + X\beta$$

are obtained by assembling the $p+1$ column vectors $X_0 = (1, 1, \ldots, 1)^\prime, X_1, \ldots, X_p$ into an $n \times p+1$ array $X$ and solving the system of linear equations

$$X^\prime X \hat\beta = X^\prime y.$$

It is equivalent to the system

$$\frac{1}{n}X^\prime X \hat\beta = \frac{1}{n}X^\prime y.$$

Gaussian elimination will solve this system. It proceeds by adjoining the $p+1\times p+1$ matrix $\frac{1}{n}X^\prime X$ and the $p+1$-vector $\frac{1}{n}X^\prime y$ into a $p+1 \times p+2$ array $A$ and row-reducing it.

The first step will inspect $\frac{1}{n}(X^\prime X)_{11} = \frac{1}{n}X_0^\prime X_0 = 1$. Finding this to be nonzero, it proceeds to subtract appropriate multiples of the first row of $A$ from the remaining rows in order to zero out the remaining entries in its first column. These multiples will be $\frac{1}{n}X_0^\prime X_i = \overline X_i$ and the number subtracted from the entry $A_{i+1,j+1} = X_i^\prime X_j$ will equal $\overline X_i \overline X_j$. This is just the formula for the covariance of $X_i$ and $X_j$. Moreover, the number left in the $i+1, p+2$ position equals $\frac{1}{n}X_i^\prime y - \overline{X_i}\overline{y}$, the covariance of $X_i$ with $y$.

Thus, after the first step of Gaussian elimination the system is reduced to solving

$$C_X\hat{\beta} = (\text{Cov}(X_i, y))^\prime$$

and obviously--since all the coefficients are covariances--that solution can be found from the covariance matrix of all the variables.

(When $C_X$ is invertible the solution can be written $C_X^{-1}(\text{Cov}(X_i, y))^\prime$. The formulas given in the question are special cases of this when $p=1$ and $p=2$. Writing out such formulas explicitly will become more and more complex as $p$ grows. Moreover, they are inferior for numerical computation, which is best carried out by solving the system of equations rather than by inverting the matrix $C_X$.)

The constant term will be the difference between the mean of $y$ and the mean values predicted from the estimates, $X\hat{\beta}$.

Example

To illustrate, the following R code creates some data, computes their covariances, and obtains the least squares coefficient estimates solely from that information. It compares them to the estimates obtained from the least-squares estimator lm.

#
# 1. Generate some data.
#
n <- 10        # Data set size
p <- 2         # Number of regressors
set.seed(17)
z <- matrix(rnorm(n*(p+1)), nrow=n, dimnames=list(NULL, paste0("x", 1:(p+1))))
y <- z[, p+1]
x <- z[, -(p+1), drop=FALSE]; 
#
# 2. Find the OLS coefficients from the covariances only.
#
a <- cov(x)
b <- cov(x,y)
beta.hat <- solve(a, b)[, 1]  # Coefficients from the covariance matrix
#
# 2a. Find the intercept from the means and coefficients.
#
y.bar <- mean(y)
x.bar <- colMeans(x)
intercept <- y.bar - x.bar %*% beta.hat

The output shows agreement between the two methods:

(rbind(`From covariances` = c(`(Intercept)`=intercept, beta.hat),
       `From data via OLS` = coef(lm(y ~ x))))

                  (Intercept)        x1        x2
From covariances     0.946155 -0.424551 -1.006675
From data via OLS    0.946155 -0.424551 -1.006675

Best Answer

Related Solutions

Solved – Combining and ranking standard deviations

Multiple Regression – Using Covariance Matrix to Find Coefficients

Analysis

Example

Related Question