Let's assume this is a fixed effects model. (The advice doesn't really change for random-effects models, it just gets a little more complicated.)
First let us distinguish the "residuals" from the "errors:" the former are the differences between the responses and their predicted values, while the latter are random variables in the model. With sufficiently large amounts of data and a good fitting procedure, the distributions of the residuals will approximately look like the residuals were drawn randomly from the error distribution (and will therefore give you good information about the properties of that distribution).
The assumptions, therefore, are about the errors, not the residuals.
No, normality (of the responses) and normal distribution of errors are not the same. Suppose you measured yield from a crop with and without a fertilizer application. In plots without fertilizer the yield ranged from 70 to 130. In two plots with fertilizer the yield ranged from 470 to 530. The distribution of results is strongly non-normal: it's clustered at two locations related to the fertilizer application. Suppose further the average yields are 100 and 500, respectively. Then all residuals range from -30 to +30, and so the errors will be expected to have a comparable distribution. The errors might (or might not) be normally distributed, but obviously this is a completely different distribution.
The distribution of the residuals matters, because those reflect the errors, which are the random part of the model. Note also that the p-values are computed from F (or t) statistics and those depend on residuals, not on the original values.
If there are significant and important effects in the data (as in this example), then you might be making a "grave" mistake. You could, by luck, make the correct determination: that is, by looking at the raw data you will seeing a mixture of distributions and this can look normal (or not). The point is that what you're looking it is not relevant.
ANOVA residuals don't have to be anywhere close to normal in order to fit the model. However, unless you have an enormous amount of data, near-normality of the residuals is essential for p-values computed from the F-distribution to be meaningful.
This is the simplest repeated measures ANOVA model if we treat it as a univariate model:
$$y_{it} = a_{i} + b_{t} + \epsilon_{it}$$
where $i$ represents each case and $t$ the times we measured them (so the data are in long form). $y_{it}$ represents the outcomes stacked one on top of the other, $a_{i}$ represents the mean of each case, $b_{t}$ represents the mean of each time point and $\epsilon_{it}$ represents the deviations of the individual measurements from the case and time point means. You can include additional between-factors as predictors in this setup.
We do not need to make distributional assumptions about $a_{i}$, as they can go into the model as fixed effects, dummy variables (contrary to what we do with linear mixed models). Same happens for the time dummies. For this model, you simply regress the outcome in long form against the person dummies and the time dummies. The effect of interest is the time dummies, the $F$-test that tests the null hypothesis that $b_{1}=...=b_{t}=0$ is the major test in the univariate repeated measures ANOVA.
What are the required assumptions for the $F$-test to behave appropriately? The one relevant to your question is:
\begin{equation}
\epsilon_{it}\sim\mathcal{N}(0,\sigma)\quad\text{these errors are normally distributed and homoskedastic}
\end{equation}
There are additional (more consequential) assumptions for the $F$-test to be valid, as one can see that the data are not independent of each other since the individuals repeat across rows.
If you want to treat the repeated measures ANOVA as a multivariate model, the normality assumptions may be different, and I cannot expand on them beyond what you and I have seen on Wikipedia.
Best Answer
The standard OLS model is $Y = X \beta + \varepsilon$ with $\varepsilon \sim \mathcal N(\vec 0, \sigma^2 I_n)$ for a fixed $X \in \mathbb R^{n \times p}$.
This does indeed mean that $Y|\{X, \beta, \sigma^2\} \sim \mathcal N(X\beta, \sigma^2 I_n)$, although this is a consequence of our assumption on the distribution of $\varepsilon$, rather than actually being the assumption. Also keep in mind that I'm talking about the conditional distribution of $Y$, not the marginal distribution of $Y$. I'm focusing on the conditional distribution because I think that's what you're really asking about.
I think the part that is confusing is that this doesn't mean that a histogram of $Y$ will look normal. We are saying that the entire vector $Y$ is a single draw from a multivariate normal distribution where each element has a potentially different mean $E(Y_i|X_i) = X_i^T\beta$. This is not the same as being an iid normal sample. The errors $\varepsilon$ actually are an iid sample so a histogram of them would look normal (and that's why we do a QQ plot of the residuals, not the response).
Here's an example: suppose we are measuring height $H$ for a sample of 6th graders and 12th graders. Our model is $H_i = \beta_0 + \beta_1I(\text{12th grader}) + \varepsilon_i$ with $\varepsilon_i \sim \ \text{iid} \ \mathcal N(0, \sigma^2)$. If we look at a histogram of the $H_i$ we'll probably see a bimodal distribution, with one peak for 6th graders and one peak for 12th graders, but that doesn't represent a violation of our assumptions.