It's an interesting question. Flippantly, it's a bit like saying "Why are tigers not lions?" to which the entirely accurate but supremely unhelpful answer is "Because they are tigers, not lions", but it naturally deserves a more serious answer.
Any answer which is, directly or indirectly, an appeal to how convenient or simple it would be if error terms were normal or Gaussian is naturally just wishful thinking, or the appeal of nice things being nicer than nasty things.
The question one way or another involves a large fraction of statistical science, but no answer can cover it all.
First I flag that to me there's a distinction:
(a) what the error term is "out there"
(b) what error term is postulated in the model
(c) what the residuals are when calculated from data.
The distinction between (b) and (c) is surely standard; mentioning (a) may well seem more dubious or at least less standard, but I think it's needed, as ideas about (a) should be behind hypotheses on (b).
Also, I suggest that a general answer to the question can't be based on assuming that error is always additive to some deterministic part, as is explicit in linear regression models. So, the error term to me implies the structure of error taken generally.
Further, I am not referring to residuals directly. I am focusing on (a) with the intended message that (a) should imply (b). Terminology is a beast here as always: the literature I read talks about residuals only as quantities calculated from the data.
If you focus on why an error term might be normally distributed, the simplest answer is because you got the deterministic part of a model almost exactly right and everything else is essentially lots of little things which, by the central limit theorem, when combined should be normal or Gaussian as a good approximation. Historically one root of this kind of model is in astronomy where often, but not always, to a very good approximation the errors are just small measurement errors.
However, there are plenty of situations in which "everything else" does not follow that description. You'll get different views across statistically-minded people on how common that is. It's arguable that the prominence of linear models with Gaussian error terms is just a kind of historical hangover hinging on various accidents: that this was the first kind of method to be worked out in real detail; that it works when it works, more or less; and that applications of this model were often relatively easy with what now are primitive calculation methods preceding the electronic computer. Also, people have invented all sorts of trickery for bending or extending the linear model in any case. Within statistical science at present, econometrics perhaps represents the extreme view that models of this kind remain central, although to be fair econometricians have been as active as any other group in exploring alternatives.
For exceptions, I will mention just two. For binary responses, with possible values say 0 or 1, such as present or absent, survived or not, and so on and so forth, the stochastic part of a model cannot be normal even in principle. For non-negative counts or other responses, the same can hold true, and the starting point is more likely to be a Poisson or some other non-normal distribution.
P.S. Whether you are using ordinary least squares has itself no influence on whether error terms are normally distributed. Naturally, it is true that if they are, then OLS is appealing as an estimation method, but that's not at all the same.
EDIT: Thanks to @whuber for his firm but gentle encouragement to clarify as far as possible. The question has morphed since I first posted, but surely (and benignly) allows answers of quite different styles.
Best Answer
it seems that you're confused about relation of the sample size to CLT application. the distribution of $\epsilon_{it}$ has nothing to do with the sample size. I'm assuming that subscript $i$ refers to the subject (a person), and a subscript $t$ refers to the tume of othe observation.
in a simple linear regression we don't make a lot of assumptions about $\epsilon$ to estimate $\beta_i$. the errors don't have to be normal, and with increasing sample size they will not tend to become normal.
CLT is applied in two different ways: