I've got a weird question. Assume that you have a small sample where the dependent variable that you're going to analyze with a simple linear model is highly left skewed. Thus you assume that $u$ is not normally distributed, because this would result in normally distributed $y$. But when you compute the QQ-Normal plot there is evidence, that the residuals are normally distributed. Thus anyone can assume that the error term is normally distributed, although $y$ is not. So what does it mean, when the error term seems to be normally distributed, but $y$ does not?
Solved – What if residuals are normally distributed, but y is not
errornormality-assumptionregressionresiduals
Related Solutions
The Central Limit Theorem applies in this case. If the residuals are not normally distributed, but the sample size is large enough, then the t statistics will be approximately t-distributed (and the F statistic will be approximately F distributed). How good the approximation is depends on how different the residuals are from the normal and how large the sample size is. Many regression problems have a combination that makes the approximation reasonable.
If there is a reason to believe a different distribution, then there are methods to fit regression models using that assumption. GLM models can fit binomial, poisson, and gamma distributed y's and using maximum likelihood or Bayesian methods (or others) can allow you to fit other distributions.
But if you are unwilling to assume normality, how can you be sure of other distributions? Sometimes it is clear, but if the residuals look like it might be a gamma, but you are not sure, then fitting based on a normal may be just as good (because of the CLT) as fitting to a gamma that does not actually fit.
If you don't want to make assumptions about the distribution of the residuals then there are options like permutation tests or bootstrapping (or other non-parametric regression tools), but all of these have their own sets of assumptions and conditions where they may work better or worse.
In the end it is important what question you are trying to answer and what you know about the science that produced the data that are the most important.
First off, I would get yourself a copy of this classic and approachable article and read it: Anscombe FJ. (1973) Graphs in statistical analysis The American Statistician. 27:17–21.
On to your questions:
Answer 1: Neither the dependent nor independent variable needs to be normally distributed. In fact they can have all kinds of loopy distributions. The normality assumption applies to the distribution of the errors ($Y_{i} - \widehat{Y}_{i}$).
Answer 2: You are actually asking about two separate assumptions of ordinary least squares (OLS) regression:
One is the assumption of linearity. This means that the trend in $\overline{Y}$ across $X$ is expressed by a straight line (Right? Straight back to algebra: $y = a +bx$, where $a$ is the $y$-intercept, and $b$ is the slope of the line.) A violation of this assumption simply means that the relationship is not well described by a straight line (e.g., $\overline{Y}$ is a sinusoidal function of $X$, or a quadratic function, or even a straight line that changes slope at some point). My own preferred two-step approach to address non-linearity is to (1) perform some kind of non-parametric smoothing regression to suggest specific nonlinear functional relationships between $Y$ and $X$ (e.g., using LOWESS, or GAMs, etc.), and (2) to specify a functional relationship using either a multiple regression that includes nonlinearities in $X$, (e.g., $Y \sim X + X^{2}$), or a nonlinear least squares regression model that includes nonlinearities in parameters of $X$ (e.g., $Y \sim X + \max{(X-\theta,0)}$, where $\theta$ represents the point where the regression line of $\overline{Y}$ on $X$ changes slope).
Another is the assumption of normally distributed residuals. Sometimes one can validly get away with non-normal residuals in an OLS context; see for example, Lumley T, Emerson S. (2002) The Importance of the Normality Assumption in Large Public Health Data Sets. Annual Review of Public Health. 23:151–69. Sometimes, one cannot (again, see the Anscombe article).
However, I would recommend thinking about the assumptions in OLS not so much as desired properties of your data, but rather as interesting points of departure for describing nature. After all, most of what we care about in the world is more interesting than $y$-intercept and slope. Creatively violating OLS assumptions (with the appropriate methods) allows us to ask and answer more interesting questions.
Best Answer
It is reasonable for the residuals in a regression problem to be normally distributed, even though the response variable is not. Consider a univariate regression problem where $y \sim \mathcal{N}(\beta x, \sigma^2)$. so that the regression model is appropriate, and further assume that the true value of $\beta=1$. In this case, while the residuals of the true regression model are normal, the distribution of $y$ depends on the distribution of $x$, as the conditional mean of $y$ is a function of $x$. If the dataset has a lot of values of $x$ that are close to zero and progressively fewer the higher the value of $x$, then the distribution of $y$ will be skewed to the right. If values of $x$ are distributed symmetrically, then $y$ will be distributed symmetrically, and so forth. For a regression problem, we only assume that the response is normal conditioned on the value of $x$.