Let's assume you have a population of size $N$ with values $x_1,\ldots,x_N$, mean $\bar x=\frac{1}{N}\sum_{i=1}^N x_i$ and variance $\sigma^2=\frac{1}{N}\sum_{i=1}^N(x_i-\bar x)^2$. (Note that I use lower case $x_i$ to indicate these are not random, but fixed values.)
Now, let's take a random sample $Y_1,\ldots,Y_n$ of $n$ elements (without replacement), with all such subsets equally likely. (Now I use capital $Y$ to indicate these are random.)
Now, $\bar Y=\frac{1}{n}\sum_{i=1}^n Y_i$ and let $V=\sum_{i=1}^n (Y_i-\bar Y)^2$ so that the sample variance would be $V/n$ (like the expression for $\sigma^2$). If we write $V$ out in terms of $(Y_i-\bar x)^2$ and $(Y_i-\bar x)(Y_j-\bar x)$, we get
$$
\begin{split}
V
=& \sum_{i=1}^n (Y_i-\bar Y)^2
= \sum_{i=1}^n \left[(Y_i-\bar x)-(\bar Y-\bar x)\right]^2 \\
=& \sum_{i=1}^n \left[(Y_i-\bar x)^2-2(Y_i-\bar x)(\bar Y-\bar x)+(\bar Y-\bar x)^2 \right] \\
=& \sum_{i=1}^n (Y_i-\bar x)^2 - n(\bar Y-\bar x)^2 \\
=& \left(1-\frac{1}{n} \right) \sum_{i=1}^n (Y_i-\bar x)^2
-\frac{2}{n}\sum_{1\le i<j\le n} (Y_i-\bar x)(Y_j-\bar x)
\end{split}
$$
where in the last step we use that
$$
\left(\sum_{i=1}^n (Y_i-\bar x)\right)^2
= \sum_{i=1}^n (Y_i-\bar x)^2 + 2\sum_{1\le i<j\le n} (Y_i-\bar x)(Y_j-\bar x).
$$
We know that $\text{E}[(Y_i-\bar x)^2]=\sigma^2$: this is just taking the average of $(Y_i-\bar x)^2$ for $Y_i$ sampled from $x_1,\ldots,N$.
For $i<j$, we can compute $\text{E}[(Y_i-\bar x)(Y_j-\bar x)]$ by using that this is the same as the average of $(x_i-\bar x)(x_j-\bar x)$ for all $1\le i<j\le N$. Since $\sum_{i=1}^N (x_i-\bar x)=0$, we get
$$
0 = \sum_{1\le i,j\le N} (x_i-\bar x)(x_j-\bar x)
= \sum_{i=1}^N (x_i-\bar x)^2 + 2\sum_{1\le i<j\le N} (x_i-\bar x)(x_j-\bar x)
$$
which for $i<j$ makes
$$
\text{E}\left[(Y_i-\bar x)(Y_j-\bar x)\right]
= -\frac{\sigma^2}{N-1}.
$$
Combining these results, we get
$$
\text{E}[V] = (n-1)\sigma^2 + \frac{n-1}{N-1}\sigma^2
= \frac{(n-1)N}{N-1}\sigma^2
$$
giving an unbiased estimator
$$
\hat\sigma^2 = \frac{N-1}{N(n-1)}V
= \frac{N-1}{N(n-1)} \sum_{i=1}^n (Y_i-\bar Y)^2.
$$
As $N\rightarrow\infty$, you get the familiar $s^2$ estimator which corresponds to independent sampling from a distribution, while $n=N$ gives just $\sigma^2$ as it should when the $x_i$ are known for the whole population.
First, your notation for the sample variance seems to be muddled. The sample variance is ordinarily defined as $S^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i - \bar X)^2,$ which makes it an unbiased estimator of the population variance $\sigma^2.$
Perhaps the most common context for 'unbiased pooled estimator' of variance is for the 'pooled t test': Suppose you have two random samples $X_i$ of size $n$ and $Y_i$ of size $m$ from populations with the same variance $\sigma^2.$ Then
the pooled estimator of $\sigma^2$ is
$$S_p^2 = \frac{(n-1)S_X^2 + (m-1)S_Y^2}{m+n-2}.$$
This estimator is unbiased.
Because one says the samples have respective 'degrees of freedom' $n-1$ and $m-1,$ one sometimes says the $S_p^2$ is a 'degrees-of-freedom' weighted average of
the two sample variances. If $n = m,$ then $S_p^2 = 0.5S_x^2 + 0.5S_Y^2.$
Note: Some authors do define the sample variance as $\frac{1}{n}\sum_{i=1}^n (X_i - \bar X)^2,$ but then the sample variance is not an unbiased estimator of $\sigma^2,$ even though it might have other properties desirable for the author's task at hand. However, most agree that the notation $S^2$ is reserved for the version with $n-1$ in the denominator, unless a specific warning is given otherwise.
Example: One common measure of the 'goodness' of an estimator is that it have a small
'root mean squared error'. If $T$ is an estimate of $\tau$ then
$\text{MSE}_T(\tau) = E[(T-\tau)^2]$ and RMSE is its square root.
The simulation below illustrates for normal data with $n = 5$ and $\sigma^2 = 10^2 = 100,$ that
the version of the sample variance with $n$ in the denominator has smaller
RMSE than the version with $n-1$ in the denominator. (A formal proof for
$n > 1$ is not difficult.)
set.seed(1888); m = 10^6; n = 5; sigma = 10; sg.sq = 100
v.a = replicate(m, var(rnorm(n, 100, sigma))) # denom n-1
v.b = (n-1)*v.a/n # denom n
mean(v.a); RMS.a = sqrt(mean((v.a-sg.sq)^2)); RMS.a
[1] 100.0564 # 70.81563
[1] 70.81563 # larger RMSE
mean(v.b); RMS.b = sqrt(mean((v.b-sg.sq)^2)); RMS.b
[1] 80.0451 # biased
[1] 60.06415 # smaller RMSE
Best Answer
Correct. Once you find the critical point, it suffices to show it corresponds to a minimum by considering the second derivative at this value. You will also find that the minimum variance attained should be $$\frac{4\sigma^2}{m + 4n}.$$