For any $n$ random variables $X_i$ , the best general bound is
$\newcommand{\Var}{\mathrm{Var}}\Var(\max X_i) \le \sum_i \Var(X_i)$ as stated in the original question.
Here is a proof sketch: If X,Y are IID then $E[(X-Y)^2] =2\Var(X)$. Given a vector of possibly dependent variables $(X_1,\ldots ,X_n)$, let $(Y_1,\ldots ,Y_n)$ be an independent vector with the same joint distribution. For any $r>0$, we have by the union bound that $P[ |\max_i X_i-\max_i Y_i|^2 >r] \le \sum_i P[ | X_i-Y_i|^2 >r]$, and integrating this $dr$ from $0$ to $\infty$ yields the claimed inequality.
If $X_i$ are IID indicators of events of probability $\epsilon$,
then $\max X_i$ is an indicator of an event of probability $n\epsilon+O(n^2 \epsilon^2)$. Fixing $n$ and letting $\epsilon$ tend to zero, we get $\Var(X_i)=\epsilon-\epsilon^2$ and $\Var(\max_i X_i)= n\epsilon +O(n^2\epsilon^2)$.
You can use the multivariate Chebyshev's inequality.
Two variables case
For a single situation, $X_1$ vs $X_2$, I arrive at the same situation as Jochen's comment on Nov 4 2016
1) If $\mu_1 < \mu_2$ then $ P(X_1>X_2) \leq (\sigma_1^2 + \sigma_2^2)/(\mu_1-\mu_2)^2 $
(and I wonder as well about your derivation)
Derivation of equation 1
- using the new variable $X_1-X_2$
- transforming it such that it has the mean at zero
- taking the absolute value
- applying the Chebyshev's inequality
\begin{array}
\\
P \left( X_1 > X_2 \right) &= P \left( X_1 - X_2 > 0 \right)\\
&= P\left( X_1 - X_2 - (\mu_1 - \mu_2) > - (\mu_1 - \mu_2)\right) \\
&\leq P\left( \vert X_1 - X_2 - (\mu_1 - \mu_2) \vert > \mu_2 - \mu_1\right) \\
&\leq \frac{\sigma_{(X_1-X_2- (\mu_1 - \mu_2))}^2}{(\mu_2 - \mu_1)^2} = \frac{\sigma_{X_1}^2+\sigma_{X_2}^2}{(\mu_2 - \mu_1)^2}\\
\end{array}
Multivariate Case
The inequality in equation (1) can be changed into a multivariate case by applying it to multiple transformed variables $(X_n-X_i)$ for each $i<n$ (note that these are correlated).
A solution to this problem (multivariate and correlated) has been described by I. Olkin and J. W. Pratt. 'A Multivariate Tchebycheff Inequality' in the Annals of Mathematical Statistics, volume 29 pages 226-234
http://projecteuclid.org/euclid.aoms/1177706720
Note theorem 2.3
$P(\vert y_i \vert \geq k_i \sigma_i \text{ for some } i) = P(\vert x_i \vert \geq 1 \text{ for some } i) \leq \frac{(\sqrt{u} + \sqrt{(pt-u)(p-1)})^2}{p^2}$
in which $p$ the number of variables, $t=\sum k_i^{-2}$, and $u=\sum \rho_{ij}/(k_ik_j)$.
Theorem 3.6 provides a tighter bound, but is less easy to calculate.
Edit
A sharper bound can be found using the multivariate Cantelli's inequality. That inequality is the type that you used before and provided you with the boundary $(\sigma_1^2 + \sigma_2^2)/(\sigma_1^2 + \sigma_2^2 + (\mu_1-\mu_2)^2)$ which is sharper than $(\sigma_1^2 + \sigma_2^2)/(\mu_1-\mu_2)^2$.
I haven't taken the time to study the entire article, but anyway, you can find a solution here:
A. W. Marshall and I. Olkin 'A One-Sided Inequality of the Chebyshev Type' in Annals of Mathematical Statistics volume 31 pp. 488-491
https://projecteuclid.org/euclid.aoms/1177705913
(later note: This inequality is for equal correlations and not sufficient help. But anyway your problem, to find the sharpest bound, is equal to the, more general, multivariate Cantelli inequality. I would be surprised if the solution does not exist)
Best Answer
The general asymptotic result for the asymptotic distribution of the sample variance is (see this post)
$$\sqrt n(\hat v - v) \xrightarrow{d} N\left(0,\mu_4 - v^2\right)$$
where here, I have used the notation $v\equiv \sigma^2$ to avoid later confusion with squares, and where $\mu_4 = \mathrm{E}\left((X_i -\mu)^4\right)$. Therefore by the continuous mapping theorem
$$\frac {n(\hat v - v)^2}{\mu_4 - v^2} \xrightarrow{d} \chi^2_1 $$
Then, accepting the approximation,
$$P\left(\frac {n(\hat v - v)^2}{\mu_4 - v^2}\leq \chi^2_{1,1-a}\right)=1-a$$
The term in the parenthesis will give us a quadratic equation in $v$ that will include the unknown term $\mu_4$. Accepting a further approximation, we can estimate this from the sample. Then we will obtain
$$P\left(Av^2 + Bv +\Gamma\leq 0 \right)=1-a$$
The roots of the polynomial are
$$v^*_{1,2}= \frac {-B \pm \sqrt {B^2 -4A\Gamma}}{2A}$$
and our $1-a$ confidence interval for the population variance will be
$$\max\Big\{0,\min\{v^*_{1,2}\}\Big\}\leq \sigma^2 \leq \max\{v^*_{1,2}\}$$
since the probability that the quadratic polynomial is smaller than zero, equals (in our case, where $A>0$) the probability that the population variance lies in between the roots of the polynomial.
Monte Carlo Study
For clarity, denote $\chi^2_{1,1-a}\equiv z$.
A little algebra gives us that
$$A = n+z, \;\;\ B = -2n\hat v,\;\; \Gamma = n\hat v^2 -z \hat \mu_4$$
which leads to
$$v^*_{1,2}= \frac {n\hat v \pm \sqrt {nz(\hat \mu_4-\hat v^2)+z^2\hat \mu_4}}{n+z}$$
For $a=0.05$ we have $\chi^2_{1,1-a}\equiv z = 3.84$
I generated $10,000$ samples each of size $n=100$ from a Gamma distribution with shape parameter $k=3$ and scale parameter $\theta = 2$. The true mean is $\mu = 6$, and the true variance is $v=\sigma^2 =12$.
Results:
The sample distribution of the sample variance had a long road ahead to become normal, but this is to be expected for the small sample size chosen. Its average value though was $11.88$, pretty close to the true value.
The estimation bound was smaller than the true variance, in $1,456$ samples, while the lower bound was greater than the true variance only $17$ times. So the true value was missed by the $CI$ in $14.73$% of the samples, mostly due to undershooting, giving a confidence level of $85$%, which is a $~10$ percentage points worsening from the nominal confidence level of $95$%.
On average the lower bound was $7.20$, while on average the upper bound was $15.68$. The average length of the CI was $8.47$. Its minimum length was $2.56$ while its maximum length was $34.52$.