Chi-square makes no assumptions about equality of group sizes.
The correction rates for the two groups can be compared (and indeed, different amounts of work per teacher within each group can be dealt with by the use of exposures, so if the A group marked twice as much work each as the B group that would also be fine).
Am I right to assume the groups are looking at the work of different students, rather than the same pool of students being marked twice?
I'd be inclined to use Poisson regression (where, for example, the model can be elaborated relatively easily, if required), but if you condition on the total number of corrections it would become a binomial test of a known proportion, which can also be done as a chi-square.
It would be good to explain what the underlying aim is more clearly, without using words like 'test', 'chisquare' or 'design' - you say 'juxtapose' - but that simply means to place unlike things together, which suggests you need a table. What do you want to find out about and why would hypothesis tests answer your underlying questions of interest?
---
Example of how to do the binomial / chi-square calculation:
Possible objection: Assumes the groups are internally homogeneous (i.e. there's no variability in the underlying rate of corrections within group - the observed variation is due to random variation around the shared level). (Other assumptions, like independence, are probably uncontroversial.)
Say the correction counts - on the same set of items, but different students - are as follows:
A: 27 30 32 34 40 30 24 30 32 19 43 31 29 27 23 total: 451
B: 32 50 43 37 39 39 38 47 31 38 total: 394
If the rate of correction is the same for both groups, the total number of corrections should be proportional to the number of teachers.
That is, the sum of the A sample is expected to be a fraction 15/(10+15) (=60%) of the overall number of corrections. The total number of corrections across all teachers is 845.
The expected number of corrections in group A is 845 x 0.6 = 507, and in group B is 845 x 0.4 = 338.
The chisquare (for my made up data!) is
$$(451 - 507)^2/507 + (394 - 338)^2/338 = 15.46$$. The d.f. is 1.
As a binomial, we just test that the A proportion is 60%:
The observed total count in A is binomial(n=845,p=0.6); with a two-tailed test, we could use the normal approximation to the binomial proportion and get:
$Z = \frac{451/845 - 0.6}{\sqrt{0.6 (1-0.6)/845}} = -3.932$
(the square of this Z is the chi=square value above; its two-tailed p-value is the same as the p-value for the chi-square)
The exact binomial calculation is also quite readily done, but I won't labor the point.
---
A more complicated - but more defensible - analysis would be to fit a mixed logistic model, where 'teacher' is a random effect. This would allow for the fact that teachers have individual variation in their correction rate.
Best Answer
If you are going along the path of stacking the data sets together, then you should define super-strata corresponding to the two data sets/waves, so that
svydesign()
knows that they are independent. Thus your newsvydesign
will have strata = cross of year and strata, the PSUs from the original designs, and the weights from the original designs.As I suggested in the comment, other ways of combining estimates and tests have been proposed in the literature. Wu (2004) uses empirical likelihood based on common variables between the two data sets.
For continuous variables, ideally, you would want to use Kolmogorov-Smirnov test with "flat" data, but I don't know whether extensions for it work for survey data; I doubt it. So you may have to convert your continuous variables to ordinal ones into say $[\log_2(n)]$ percentile groups or equal width bins of the variable range (where the above function of the sample size is a commonly used number of bins for a histogram), and apply the Rao-Scott $\chi^2$ to them.