For i.i.d. random variables $X_1, \dotsc, X_n$, the unbiased estimator for the variance $s^2$ (the one with denominator $n-1$) has variance:
$$\mathrm{Var}(s^2) = \sigma^4 \left(\frac{2}{n-1} + \frac{\kappa}{n}\right)$$
where $\kappa$ is the excess kurtosis of the distribution (reference: Wikipedia). So now you need to estimate the kurtosis of your distribution as well. You can use a quantity sometimes described as $\gamma_2$ (also from Wikipedia):
$$\gamma_2 = \frac{\mu_4}{\sigma_4} - 3$$
I would assume that if you use $s$ as an estimate for $\sigma$ and $\gamma_2$ as an estimate for $\kappa$, that you get a reasonable estimate for $\mathrm{Var}(s^2)$, although I don't see a guarantee that it is unbiased. See if it matches with the variance among the subsets of your 500 data points reasonably, and if it does don't worry about it anymore :)
Ok. The model is, in matrix notation and conformable dimensions
$$\mathbf y = \mathbf X\beta + \mathbf u $$
The $OLS$ estimator is
$$\hat \beta = (\mathbf X'\mathbf X)^{-1}\mathbf X' \mathbf y = (\mathbf X'\mathbf X)^{-1}\mathbf X' (\mathbf X\beta + \mathbf u) $$
$$= (\mathbf X'\mathbf X)^{-1}\mathbf X' \mathbf X\beta + (\mathbf X'\mathbf X)^{-1}\mathbf X'\mathbf u = \beta + (\mathbf X'\mathbf X)^{-1}\mathbf X'\mathbf u$$
For consistency we examine
$$\operatorname{plim}\hat \beta = \operatorname{plim}\beta + \operatorname{plim}\left[(\mathbf X'\mathbf X)^{-1}\mathbf X'\mathbf u\right] = \beta + \operatorname{plim}\left[\left(\frac 1n\mathbf X'\mathbf X\right)^{-1}\left(\frac 1n\mathbf X'\mathbf u\right)\right] $$
And here is the crucial point that makes us need a weaker assumption for consistency compared to unbiasedness: for unbiasedness we would face $E\left[(\mathbf X'\mathbf X)^{-1}\mathbf X'\mathbf u\right]$, and in order to "insert" the expected value into the expression we have to condition on $\mathbf X$, which leads us to the expression $E(\mathbf u\mid \mathbf X)$ and the need to assume it as being equal to zero, i.e. assume "mean-independence" between the error term and the regressors.
But $\operatorname{plim}$ is a more "flexible" operator than $E$: under $\operatorname{plim}$ expressions and products can be decomposed (something that under the expected value requires independence), and also $\operatorname{plim}$ can "go inside the expression" (while $E$ cannot except if it is an affine function), as long as the function is a continuous transformation (and it very rarely isn't) - so
$$\operatorname{plim}\left[\left(\frac 1n\mathbf X'\mathbf X\right)^{-1}\left(\frac 1n\mathbf X'\mathbf u\right)\right] = \operatorname{plim}\left(\frac 1n\mathbf X'\mathbf X\right)^{-1}\operatorname{plim}\left(\frac 1n\mathbf X'\mathbf u\right)$$
For consistency we need to assume that the first $\operatorname{plim}$ is finite -but this is an assumption on the properties of the regressor matrix, unrelated to the error term. So we are left with the second $\operatorname{plim}$ which, written for clarity using sums it is
$$\operatorname{plim}\left(\frac 1n\mathbf X'\mathbf u\right) = \left[\begin{matrix}
\operatorname{plim}\frac 1n\sum_{i=1}^nx_{1i}u_i \\
.\\
.\\
\operatorname{plim}\frac 1n\sum_{i=1}^nx_{ki}u_i \\
\end{matrix}\right] \rightarrow\left[\begin{matrix}
\frac 1n\sum_{i=1}^nE(x_{1i}u_i) \\
.\\
.\\
\frac 1n\sum_{i=1}^nE(x_{ki}u_i) \\
\end{matrix}\right] $$
...the last transformation due to the usual assumptions that permit the application of the law of large numbers.
Exactly because we have been able to "separate" $(\mathbf X'\mathbf X)^{-1}$ from $\mathbf X'\mathbf u$ (due to the fact that we are examining the $\operatorname{plim}$ and not $E$) we ended up looking only at the contemporaneous relation between each regressor and the error term. And so what we need to assume for consistency of the $OLS$ estimator is only that $E(x_{1i}u_i) =0 \; \forall k, \; \forall i$, (contemporaneous uncorrelatedness) which is much weaker than $E(\mathbf u\mid \mathbf X)$, the latter requiring mean-independence, and moreover, not only contemporaneous independence, but across time too (since we condition the whole error vector on the whole regressor matrix).
Best Answer
The bias of an estimator $\hat \theta_n$ of a parameter $\theta^0$ is defined as
$$B(\hat \theta_n) = E(\hat \theta_n) - \theta^0,$$
where the $n$ subscript indicates that the estimator is a function of the sample size. It follows that the distribution of the estimator is a function of the sample size, meaning that, in general, for each different $n$ the estimator will have a different distribution (maybe only slightly so), which will have a different expected value and so a different bias.
It may not be feasible to obtain information as to how the bias changes (monotonically? non-monotonically?).