Over rolling windows you can update your posteriors by re-estimating the model every 50 observations (or however often you want to do it). This can easily be accomplished in a for loop.
Posterior samples, forecasts, and other quantities can get re-generate with each re-estimation of the model. At the end of the day it is just a lot of loops. You may also want to consider running the models in parallel (like with the foreach and doParallel libraries) to speed up computation time if that is an issue for you.
Unfortunately, I do not know of any R packages that provide functions for automatically computing rolling window forecasts in a Bayesian setting. For the most part I think you have to program it yourself. It is a little bit tedious figuring out how index and store everything but the good news is that it is not that hard to do.
Though I do not think you need it, I give more intuition below (which does not differ much from the intuition I gave in the previous post you referenced).
I Assume your test set consists of $r_1,r_2,r_3,...,r_t$ and that you want to generate forecasts for an out of sample window $r_{t+1},r_{t+2},...,r_{t+K}$.
In a Bayesian setting the forecasts are treated as random samples with their own joint posterior distribution. So instead of producing point forecasts, you will draw a sample from the forecast distribution;
$$P(r_{t+1},r_{t+2},...,r_{t+K}|r_1,r_2,...,r_t,\boldsymbol{\theta}_t)$$ where $\boldsymbol{\theta}_t$ is just my shorthand notation for a vector containing the model parameters.
You can sample form the above forecast distribution with a Gibbs sampler. If your posterior is made up of $G$ samples, then for each $g=1,2,3,...,G$ you can draw
$r^{(g)}_{t+1} \sim P(r_{t+1}|r_1,...,r_t,\boldsymbol{\theta}^{(g)}_t)$
then
$r^{(g)}_{t+2}\sim P(r_{t+2}|r_1,...,r_t,r^{(g)}_{t+1},\boldsymbol{\theta}^{(g)}_t)$
and keep going until you sample
$r^{(g)}_{t+K} \sim P(r_{t+K}|r_1,...,r_t,r^{(g)}_{t+1},r^{(g)}_{t+K-1},\boldsymbol{\theta}^{(g)}_t)$
for any $k=1,2,...,K$ the conditional posterior $P(r_{t+k}|r_1,...,r_t,r^{(g)}_{t+1},r^{(g)}_{t+k-1},\boldsymbol{\theta}^{(g)}_t)$ can be sampled from in the following manner
$$
\omega_{t+k}^{(g)} \sim IG\bigg(\frac{v^{(g)}}{2},\frac{v^{(g)}}{2}\bigg)\;\;\;\;\;\varepsilon_{t+k}^{(g)} \sim N(0,1)
$$
$$
r_{t+k}^{(g)} = \varepsilon_{t+k}^{(g)}\times \bigg(\frac{v^{(g)}-2}{v^{(g)}} \omega_{t+k}^{(g)}h_{t+k}^{(g)}\bigg)^{1/2}
$$
$h^{(g)}_{t+k+1}$, which is needed to sample $r^{(g)}_{t+k+1}$, can be calculated deterministically once $r^{(g)}_{t+k}$ is given as shown below
$$
h_{t+k+1}^{(g)} = \alpha_0^{(g)} + \alpha_1^{(g)}r_{t+k}^{(g)} + \beta^{(g)}h_{t+k}^{(g)}
$$
Also, I see that you simply fixed $h_0=0$. I do not know whether or not that is a good assumption. However, I do not know off the top of my head a better way of going about it.
If you are mainly interested in the proportion of outcomes taking Value 3,
then it seems best to compare that proportion in A, which is $292/528 = 0.553,$
with that proportion in B, which is $274/509 = 0.538.$ The difference seems
quite small.
A formal test (here done in Minitab) shows that this difference
is not significant at the 5% level (P-value $0.635 > 0.05).$ Also notice
that a 95% confidence interval for the population difference covers $0$ (no
difference).
Test and CI for Two Proportions
Sample X N Sample p
1 292 528 0.553030
2 274 509 0.538310
Difference = p (1) - p (2)
Estimate for difference: 0.0147199
95% CI for difference: (-0.0458945, 0.0753343)
Test for difference = 0 (vs ≠ 0): Z = 0.48 P-Value = 0.634
This test uses a normal approximation of the difference between
two binomial proportions, which should be very accurate for your sample sizes
above 500.
Notes: [a] You could also do a chi-squared test of the null hypothesis that the
proportions of outcomes with Values 3 through 9 are 'homongeneous' for A and B.
(Computations are the same as for a test of 'independence' between Values (3
through 9) and Types (A and B). That test also does not give a significant result.
Pearson Chi-Square = 0.235, DF = 6, P-Value = 1.000
[b] I do not see how it would be appropriate to use a t test to answer this question.
Best Answer
Let's address the underlying statistical question, and then briefly mention doing that in R.
You want to do a test for a discrete uniform distribution applied to each subgroup. So let's think of a specific subgroup. You have a sequence of digits (which we can tabulate as a set of counts). For example:
i.e. '0' occurred twice, '1' occurred 8 times and so on.
How do we test for uniformity?
You suggested testing variance. I initially assumed you mean the variance of the distribution of digit-values (which would be viable as a test statistic for particular kinds of deviation), but I now wonder if maybe you mean the variance of the observed counts.
Let's discuss both, second first:
a) Variance of counts of digits. That is, if the observed count for digit $i$ is $O_i$, in this case I guess you mean to take a constant times $\sum_{i=0}^9(O_i-\bar{O})^2$ (a sample variance of the counts would make that constant $\frac{1}{(10-1)}$, but let's leave that to one side for a moment).
That's actually a pretty good idea, and I want to tweak it just the tiniest bit, for reasons that will become clearer in a moment.
Note that $\bar O = \frac{1}{10}\sum O_i=\frac{n}{10}$, which is just the expected count for each digit -- let's call that $E_i$ (that may seem unnecessary, but you'll have to indulge me a moment).
Then the sum of squared deviations from expected is now $\sum_{i=0}^9(O_i-E_i)^2$, which is proportional to $\frac{1}{E_i}\sum_{i=0}^9(O_i-E_i)^2=\sum_{i=0}^9\frac{(O_i-E_i)^2}{E_i}$ ... which is just the usual chi-square goodness of fit statistic. So the variance of the counts of digits is (apart from a scaling constant) a well-known test for goodness of fit to the hypothesized uniform distribution of counts.
[This could be done in R by calling
tapply
on your second column of data with a functionf
, wheref=function(x) chisq.test(table(x))$p.value
and the index being the ID]b) Now if you mean the variance of the distribution of digit-values, that will have very good power against particular kinds of non-uniformity, such as:
... specifically, nonuniform distributions with larger or smaller variance than the discrete uniform. But it will have very poor power against non-uniform distributions with very similar variance to the discrete uniform, such as either of these:
If you're only interested in non-uniforms with larger or smaller variance than the discrete uniform and don't care about the later possible alternatives, this is fine. There are a couple of ways to go about testing this, which I'll go into if you definitely want this option.
But note carefully: the variance of the discrete uniform on 0,1,...,9 is not $\frac{(9-0)^2}{12}$. That's the continuous uniform. The discrete uniform on 0,1,...,9 has variance $\frac{(10^2-1)}{12}=\frac{(9\times 11)}{12}=8.25$
As requested, here's one test of variance. This one is easy to do.
Consider the variance statistic about the expected value for the uniform rather than the sample mean:
$T = \frac{1}{n}\sum_i (X_i-\frac{9}{2})^2$
This should have better power than a test using the ordinary sample variance would have against mean shifts. This is also quite easy to work out the asymptotic distribution of:
$\lim_{n\to\infty}\sqrt{n}(T_n-8.25) \sim N(0,52.8)\,,$
and convergence is so rapid that it's reasonable to use this at fairly small sample sizes.
In the right tail it looks to be quite good above about n=50 (personally, I'd happily use it down to about n=10, but I'm not fussy about exactness of type I error rates). In the right tail it looks fine down below n=10.
[Even at n=5, it's not so bad -- the left hand tail was giving a true significance level of about 2% for a nominal 2.5% left hand tail normal critical value.]
One can make a test with the ordinary sample variance, but it approaches its asymptotic distribution considerably more slowly.
To do that, we can actually compute the variance of the sample variance (it involves fourth moments), and we could then use an asymptotic approximation to the distribution of the variance (but as I mentioned it comes in relatively more slowly than the variance about 4.5). Or we could simulate from the null distribution at any given sample size to get an approximate p-value (if I was going to use the variance, this is what I'd do).