In my script for statistical signals, I have some troubles to get the same result for the variance of an estimator $T$.
Here is the example:
Given the observations $X_1, \dots , X_N$ of a uniquely distributed random variable
$$
X: \Omega \rightarrow [0,\theta]
$$
with $[0, \theta] \subset \mathbb{R}$ such that the CDF $F_X (\xi) = \frac{\xi}{\theta}$ and the PDF $f_X(xi) = \frac{1}{\theta}$ if $0 \leq \xi \leq \theta$.
To estimate the upper bound $\theta$ of the uniform distribution, the expected value $\mathbb{E}[X] = \frac{\theta}{2}$ is used which is the mean of this partictular uniform distribution. Then the estimator $T$ is given as:
$$
T = 2 \cdot \underbrace{\frac{1}{N} \sum_{i = 1}^N X_i}_{Average}: \quad x_1, \dots , x_N \rightarrow \hat{\theta}
$$
Now the expected value for this estimator is calculated as:
$$
\mathbb{E}[T(X_1, \dots , X_N)] = \mathbb{E}[\frac{2}{N}\sum_{i = 1}^{N}X_i] = \frac{2}{N}\sum_{i = 1}^{N}\mathbb{E}[X_i] = \frac{2}{N} \cdot N \cdot \frac{\theta}{2} = \theta
$$
which makes this estimator unbiased since the expected value is exactly the wanted parameter $\theta$.
Finally the Variance $Var[T]$ is just given as:
$$
Var[T] = \frac{\theta^2}{3N}
$$
However, I don't know how to obtain this result.
I tried two approaches to get the same result for the variance:
In my first approach, using the definition of the variance, depending on the expected value of the estimator, there seems to be an error in my calculations or something that I miss.
The second approach gives me the same result but I don't really understand the properties/rules for the variance I used.
First approach:
According to the definition the variance is:
$$
Var[T(X_1, \dots , X_N)] = \mathbb{E}[(T-\mathbb{E}[T])^2] = \mathbb{E}[T^2] – \mathbb{E}[T]^2
$$
Now in order to get the variance, $\mathbb{E}[T^2]$ and $\mathbb{E}[T]^2$ are needed. $\mathbb{E}[T]^2 = \theta^2$ is already available (see above). In my calculations of $\mathbb{E}[T^2]$ seems to be an error:
$$
\mathbb{E}[T^2] = \mathbb{E}\left[\frac{4}{N^2}\left(\sum_{i = 1}^N X_i\right)^2\right]
$$
Because of my assumed independence of $X_i, X_j$ this equation results in (where I am really not sure if this step is correct):
$$
\mathbb{E}[T^2] = \frac{4}{N^2}\sum_{i = 1}^N \mathbb{E}[X_i^2]
$$
Now using functions of random variables $E[g(x)] = \int_{\mathbb{R}}{g(x)f_x (x)dx}$, where $g(x) = x^2$, the previous equation is:
$$
\mathbb{E}[T^2] = \frac{4}{N^2}N \int_{0}^{\theta}{x^2 \frac{1}{\theta} dx} = \frac{4}{N} \frac{\theta^3}{3} \frac{1}{\theta} = \frac{4}{N} \frac{\theta^2}{3}
$$
However, this obviously leads to:
$$
Var[T] = E[T^2] – E[T]^2 = \frac{4}{N} \frac{\theta^2}{3} – \theta^2 = \frac{\theta^2 (4 – N3)}{N 3}
$$
which is wrong. Can you please tell me where I made a mistake?
Second approach using properties of variances (correct result):
$$
Var[T] = Var[\frac{2}{N} \sum_{i = 1}^N X_i] = \frac{4}{N^2} Var[\sum_{i = 1}^N X_i] = \frac{4}{N^2} \sum_{i = 1}^N Var[X_i] = \frac{4}{N^2} N \frac{\theta^2}{12} = \frac{\theta^2}{3}
$$
where I used the variance of the uniform distribution $Var[X_i] = \frac{\theta^2}{12}$ and the following rules for variance:
$$
Var[\alpha X + \beta] = \alpha^2 Var[X]
$$
and
$$
Var[\sum_{i = 1}^N X_i] = \sum_{i = 1}^N Var[X_i] + \sum_{i \neq j} Cov[X_i,X_j]
$$
For the second rule I assumed independence for the statistics $X_i$ which leads to $\sum_{i \neq j} Cov[X_i,X_j] = 0$. Though I am not sure if this is right but it lead to the correct result.
Could you please tell me how to derive these rules?
I would be glad to get the variance using my first approach with the formulas I mostly understand and not the second approach where I have no clue where these rules of the variance come from.
Best Answer
Why $\text{Cov}(X, Y) = 0$ if $X$ and $Y$ are independent?
By definition, $\text{Cov}(X, Y) = \mathbb{E}((X - \mu_X)(Y - \mu_Y))$. Hence, \begin{align*} \text{Cov}(X, Y) &= \mathbb{E}(XY - \mu_YX - \mu_XY + \mu_X \mu_Y) \\ &= \mathbb{E}(XY) - \mathbb{E}(\mu_YX) - \mathbb{E}(\mu_XY) + \mathbb{E}(\mu_X \mu_Y) ~ (\text{By linearity of expectation}) \\ &= \mathbb{E}(XY) - \mu_Y\mathbb{E}(X) - \mu_X\mathbb{E}(Y) + \mu_X \mu_Y ~ (\mu_X ~ \text{and} ~ \mu_Y ~ \text{are constants)} \\ &= \mathbb{E}(XY) - \mathbb{E}(X)\mathbb{E}(Y)! \\ \end{align*} Since $\mathbb{E}(XY) = \mathbb{E}(X)\mathbb{E}(Y)$ if $X$ and $Y$ are independent, $\text{Cov}(X, Y) = 0$.
Next, why $\text{Var}(\sum^n_{i = 1}X_i) = \sum^{n}_{i = 1}\text{Var}(X_i)$ if $X_1, X_2, \ldots, X_n$ are independent?
Take $n = 2$, \begin{align*} \text{Var}(X_1 + X_2) &= \mathbb{E}((X_1 + X_2)^2) - (\mathbb{E}(X_1 + X_2))^2 ~ (\text{definition}) \\ &=\mathbb{E}(X_1^2 + 2X_1X_2 + X_2^2) - (\mathbb{E}(X_1) + \mathbb{E}(X_2))^2 ~ (\text{expansion and linearity of expectation}) \\ &= \mathbb{E}(X^2_1) + 2\mathbb{E}(X_1X_2) + \mathbb{E}(X_2^2) - \mathbb{E}(X_1)^2 - 2 \mathbb{E}(X_1) \mathbb{E}(X_2) - \mathbb{E}(X_2)^2 \\ &= \mathbb{E}(X^2_1) - \mathbb{E}(X_1)^2 + \mathbb{E}(X^2_2) - \mathbb{E}(X_2)^2 ~ (\text{again, }\mathbb{E}(XY) = \mathbb{E}(X)\mathbb{E}(Y) \text{ by assumption}) \\ &= \text{Var}(X_1) + \text{Var}(X_2) \end{align*}
Back to the original question, what is $\text{Var}(T)$? \begin{align*} \text{Var}(2\frac{\sum^n_{i = 1}X_i}{n}) &= \frac{4}{n^2} \sum_{i = 1}^n \text{Var}(X_i) \\ &= \frac{4}{n^2} \times \frac{n\theta^2}{12} \\ &= \frac{\theta^2}{3n} \end{align*} Done!