The value of the ratio can be obtained as follows, noting that the discrete sampling in lines 3 and 4 of the code implies 3 options for $X1$, 3 for $X2$, and 7 for $Y$ conditional on each combination of $X1$ and $X2$:
$(N – K)_{old}$ = 1000 – 3 = 997
$(N – K)_{new}$ = (3 x 3 x 7) – 3 = 63 – 3 = 60
Ratio of standard errors = $\sqrt\frac{(N – K)_{new}}{(N – K)_{old}} = \sqrt\frac{60}{ 997} = 0.2453172$
Why is $(N – K)$ relevant here? Because it is the denominator in the formula for the OLS estimator $s^2$ of the variance parameter $\sigma_0^2$:
$s^2 = \frac{SSR}{N – K}$
where SSR is the sum of squared residuals. This in turn feeds into the estimator of the variance of the coefficient vector $\sigma_0^2$, with $X$ as the matrix of independent variables:
$V[\hat\beta| X] = \sigma_0^2.(X’X)^{-1}$ estimated by $s^2.(X’X)^{-1}$
and so into the standard errors of the coefficients. These formulae also apply to the WLS coefficients provided that SSR and X are based on the weighted variables (WLS being equivalent to OLS on weighted variables).
However, it is important to note (especially for anyone who may be using regression with aggregate data in real applications such as in economics) that the simple ratio formula above only works because of particular features of this case. In general the effect of aggregation on standard errors is more complex, with effects via the SSR and $X’X$ (which happen to cancel out in this case) needing to be considered as well as those via $(N – K)$.
A relevant feature of this case is that aggregation does not group together different $Y$ values, each combination of $X1$, $X2$ and $Y$ forming a separate aggregation. Thus there is no averaging of $Y$ values which would tend to reduce the residuals. Suppose, by contrast, that a sample has two observations of $y$ for each observed $x$ value and that in each case the two observations happen to lie on opposite sides of the fitted line, but the same distance from it. Then regression on the unaggregated data will produce non-zero standard errors of the coefficients, but aggregation at each $x$ value (that is, averaging of its two $y$ observations) will produce a perfect fit with zero residuals and therefore zero standard errors. In that case, therefore, the zero SSR in the aggregate model will dominate any effects via $(N – K)$ and $X’X$.
This question is ancient, but it seems like you are looking for ordinal regression. Basically, for your 9 ordinal categories, you want to make 8 classifiers. The first classifier would be "is the category greater than 1 or less than or equal to 1?" The second classifier would be "is the category greater than 2, or less than or equal to 2?", etc.
If you're still looking for a solution to the problem, I'd look at the literature on ordinal regression. It may fit your problem better than aggregating your data or doing more complicated techniques first.
Best Answer
Okay, I found out that what I was looking for is called a hierarchical linear models (see wikipedia). Just dropping that here, in case someone else encounters a similar problem.