according to the central limit theorem - the more samples you take from a population, no matter what shape the distribution is, the more normal your sampling distribution becomes.
This is incorrect in several respects. (I expect you've been told something close to this, but it's not the case.)
"no matter what shape the distribution is" ... not so. There are distributions for which the CLT doesn't hold (other limit theorems may apply).
"the more samples you take [...] the more normal your sampling distribution becomes" ... the central limit theorem doesn't actually assert this. It does happen (under certain conditions), but it's not really the CLT that says that. An example of a result that does say something (more or less) along these lines would be the Berry-Esseen theorem, which gives a bound on how far the distribution of the standardized mean can be from the standard normal distribution, and which bound decreases as $1/\sqrt{n}$.
So why, despite the central limit theorem, is the sample distribution of r not normally distributed?
The sample correlation statistic $r$ can be construed as a kind of (scaled) mean (especially if you define it with an $n$ denominator), and the CLT should apply to it. But the CLT is a theorem about the limit as $n\to\infty$; you have finite $n$. At finite sample sizes the distribution won't be normal -- indeed this should be obvious, since $r$ is bounded. The bounds will tend to make it skew as you move toward one or the other.
Indeed, even the effect you thought was CLT -- the one where increasing the sample size tends to make the sampling distribution of means more normal -- applies to $r$ too. As you take larger samples, $r$ does tend to be more nearly normally distributed in a very wide variety of circumstances.
You can investigate this fairly easily using simulation.
However, while it becomes more nearly normal as $n$ increases for a given population $\rho$, it is still generally skew at every finite sample size (when $\rho\neq 0$). The same applies when taking sample means from skewed distributions --- the skewness decreases with larger samples, but it doesn't go away.
But we don't get skew when plotting the sampling distribution of the mean
Not so! If the distribution you draw samples from is skewed, the sample means will also be skewed (but typically substantially less so):
As the sample size gets larger, the skewness in the sample mean will tend to reduce.
If we drew samples from a suitable distribution on $[-1,1]$, say a scaled-shifted beta, then as the population mean approached those bounds for a given $n$, the sample mean would tend to be more skewed (left skew for a population mean near $1$, right skew for a population mean near $-1$). But as the sample size increased for a given mean, the distribution would be less skewed. The situation is similar for the sample correlation.
It depends on what you’re doing. If you just want to predict, then it doesn’t matter, and the Gauss-Markov theorem does not say anything about a normal error term.
However, when the error term is normal, then the OLS estimator $\hat{\beta}$ is the maximum likelihood estimator. If you don’t know about MLEs, you’ll see them over and over as you dive into statistics, but maximum likelihood is a nice property for many reasons.
Among those reasons is that the inferential methods like p-values on coefficients and F-tests of nested models come into play.
So if you want to do some kind of ANOVA, for example, the normality of the error term matters because you’re doing hypothesis testing, not prediction.
The pooled distribution of the response variable (all of your $y$s) definitely does not have to be normal, even to get that maximum likelihood property and do inference, and the predictor variables definitely don’t have to be normal. Predictors often cannot be normal, such as when they are categorical variables e.g. male/female, treatment/control, etc.
EDIT
We often talk about normal residuals. This is casual language, and experienced statisticians know what is meant, but the residuals are a discrete distribution and cannot be normal. What we assume is a normal error term, and we use the residuals to gauge if that is a good assumption or not.
Best Answer
It is because GLM parameter estimates are maximum likelihood estimates, and those are asypmtotically normal if we assume the observations are independent. See here, for instance. Note that this is only an asymptotic result so for a particular finite sample your test statistics won't be exactly normal.