Let's present a case where it is preferable to use the pooled-data estimator. This is related to an old question of mine so some things are repeated.
Pooling the available data of total size $n$ against keeping it separate and work with $m$ smaller samples each of size $n_j = n/m$ has an advantage in the context of OLS regression, as regards the variance of the OLS estimator, at least when the $m$ samples are considered independent.
Consider a linear regression setting (to which the OP alludes), and let's say that all nice properties of the OLS estimator do hold: unbiasedness, efficiency, consistency.
Let's say we have two options: take one sample of size $n$ or $m$ samples each of size $n_j = n/m$.
If we take one sample, we will run one regression,
$$\mathbf y = \mathbf X\beta + \mathbf u,\;\; {\rm Var}(\mathbf u\mid X) =\sigma^2\mathbf I$$
and we will obtain a single estimate, $\hat \beta$.
If we take $m$ independent samples we will run $m$ independent regressions,
$$\mathbf y_j = \mathbf X_j\beta + \mathbf u_j, \;\;{\rm Var}(\mathbf u_j \mid X_j) =\sigma^2\mathbf I,\;\; j=1,...m$$
and we will get $m$ estimates $b_j, j=1,...,m$. This looks better, we can obtain an empirical distribution, rather than just a single estimate. But what if we want eventually to obtain, from this route also, a single estimate? It appears that the most natural thing would be to average over the lot, arriving thus at the averaging estimator
$$\bar b = \frac 1m\sum_{j=1}^{m}b_j$$
All estimators involved in both approaches are unbiased and consistent. But informally, in the first case, we appeal to the consistency property of the estimator: by using a large sample we hope that asymptotic properties will affect beneficially the accuracy of the estimate that we will obtain.
In the second case, we appeal to the unbiasedness property of the estimators: obtain many estimates and take the average, estimating the expected value of the estimator which equals the true value.
What happens is that
$${\rm Var}(\hat \beta \mid \mathbf X) < {\rm Var}(\bar b \mid \mathbf X)$$
in the matrix sense, i.e. that the difference ${\rm Var}(\bar b \mid \mathbf X) -{\rm Var}(\hat \beta \mid \mathbf X)$ is a positive definite matrix: The estimator from the single sample has lower variance than the averaging estimator.
Consider first
$${\rm Var}(\bar b \mid \mathbf X) = {\rm Var}\left(\frac 1m\sum_{j=1}^{m}b_j\mid \mathbf X\right) = \frac 1{m^2}\sum_{j=1}^{m}{\rm Var}(b_j\mid \mathbf X) = \sigma^2\frac 1{m^2}\sum_{j=1}^{m}(\mathbf X_j' \mathbf X_j)^{-1}$$
No covariances appear because the samples are assumed independent. Using the symbol $A$ to denote the arithmetic mean, we can write
$${\rm Var}(\bar b\mid \mathbf X) = \frac {\sigma^2}{m}\cdot A\left[\left(\mathbf X_j' \mathbf X_j\right)^{-1};j=1,...,m\right] \tag{1}$$
For the single-sample estimator we have
$${\rm Var}(\hat \beta\mid \mathbf X) = \sigma^2 (\mathbf X' \mathbf X)^{-1}$$
Now, $\mathbf X$ is a matrix that stacks the $\mathbf X_j$ matrices of the $m$ samples,
$$\mathbf X = \left [ \begin{matrix} \mathbf X_1 \\
\mathbf X_2\\
. \\
. \\
\mathbf X_m
\end{matrix}\right]$$
Therefore,
$$\mathbf X' \mathbf X = \mathbf X_1' \mathbf X_1 + \mathbf X_2' \mathbf X_2 +...+\mathbf X_m' \mathbf X_m$$
Manipulate this into
$$\mathbf X' \mathbf X = \left[\left(\mathbf X_1' \mathbf X_1\right)^{-1}\right]^{-1} + \left[\left(\mathbf X_2' \mathbf X_2\right)^{-1}\right]^{-1} +...+\left[\left(\mathbf X_m' \mathbf X_m\right)^{-1}\right]^{-1}$$
$$\implies \left(\mathbf X' \mathbf X\right)^{-1} = \left(\left[\left(\mathbf X_1' \mathbf X_1\right)^{-1}\right]^{-1} + \left[\left(\mathbf X_2' \mathbf X_2\right)^{-1}\right]^{-1} +...+\left[\left(\mathbf X_m' \mathbf X_m\right)^{-1}\right]^{-1}\right)^{-1}$$
After our eyes adjust to the three layers of inverses, we can see that the right hand side is the scaled harmonic mean of the $\left(\mathbf X_j' \mathbf X_j\right)^{-1}$ matrices. Note that this is the harmonic mean in the matrix sense of the term.
Using the symbol $H$ to denote the harmonic mean we have
$${\rm Var}(\hat \beta\mid \mathbf X) = \sigma^2 (\mathbf X' \mathbf X)^{-1} = \frac {\sigma^2}{m}\cdot H\left[\left(\mathbf X_j' \mathbf X_j\right)^{-1};j=1,...,m\right] \tag{2}$$
For scalars, it is well known that $H<A$ always. It has been proven that the same holds for matrices too. So we have proven that
$${\rm Var}(\hat \beta\mid \mathbf X) < {\rm Var}(\bar b \mid \mathbf X)$$
again in the matrix sense.
So in this set up, we prefer to pool the data because it lowers the variance.
Note: "pooling the data" does not always produce this beneficial effect. For example, if we want to estimate the mean of a population, the variance of the sample mean from a single-sample of size $n$ is the same as the variance of the average of $m$ sample means from $m$ samples each of size $n/m$.
Best Answer
A comment clarified that the best estimator of the variance is intended to be unbiased. A standard way to find such an estimator is to restrict one's attention to linear combinations of the estimators (because almost anything else would be difficult to analyze).
Let there be $k$ independent samples indexed by $i=1,2,\ldots, k$, each with its own unbiased variance estimator $\hat\sigma_i^2$. Let the unknown weights of the linear combination be $w_i$, so that the combined estimator will be
$$\hat\sigma^2 = \sum_{i=1}^k w_i \hat \sigma_i^2.$$
Because this is supposed to be unbiased for any population, by definition the population variance will equal its expected value:
$$\sigma^2 = \mathbb{E}(\hat\sigma^2) = \sum_{i=1}^k w_i \mathbb{E}(\hat\sigma_i^2) = \sum_{i=1}^k w_i \sigma^2 = \left(\sum_{i=1}^k w_i\right)\sigma^2.$$
Since $\sigma^2 \ne 0$ is possible, division of both sides by $\sigma^2$ implies the weights sum to unity:
$$1 = \sum_{i=1}^k w_i.$$
Let the sample size for estimator $\hat\sigma_i$ be $n_i$. (In the question all the $n_i$ are equal to $n$.) Because each estimator $\hat\sigma_i^2$ has $n_i-1$ degrees of freedom, its variance will be approximately proportional to $1/(n-1)$ times some value that is an (unknown) property $f$ of the population. (This unknown property depends on the first four moments of the population.) In fact, for a Normal population this is not an approximation at all: the variances of the estimators are exactly proportional to $1/(n_i-1)$. Therefore we may approximate the variance of the combined estimator as
$$\operatorname{Var}(\hat\sigma^2) = \operatorname{Var}\left(\sum_{i=1}^k w_i \hat\sigma_i^2\right) = \sum_{i=1}^k w_i^2 \operatorname{Var}(\hat\sigma_i^2) \approx \sum_{i=1}^k w_i^2 \frac{f}{n_i-1}.$$ The second equality is due to the independence of the $k$ samples.
Subject to the sum-to-unity constraint, this variance is minimized when the $w_i$ are proportional to $n_i-1$. Therefore an (approximate) minimum variance unbiased linear estimator of the population variance is
$$\hat\sigma^2 = \frac{1}{n_1+\cdots n_k - k}\sum_{i=1}^k (n_i-1)\hat\sigma_i^2.$$
When all the $n_i$ are equal to a common value $n$, this reduces to the arithmetic mean of the individual variance estimators. It should be intuitively obvious that some kind of equally-weighted average of all $k$ estimators would be the best one in this case.