It would be difficult to be clearer than what has been said for the other posts. Nevertheless I will try to say something to the point that addresses the different assumptions that are needed for OLS and various other estimation techniques to be appropriate to use.
OLS estimation: This is applied in both simple linear and mutliple regression where the common assumptions are (1) the model is linear in the coefficients of the predictor with an additive random error term (2) the random error terms are (a) normally distributed with 0 mean and (b) a variance that doesn't change as the values of the predictor covariates (i.e. IVs) change, Note also that in this framework which applies in both simple and multiple regression the covariates are assumed to be known without any uncertainty in their given values. OLS can be used when either A) only (1) holds with 2(b) or B) both (1) and (2) hold.
If B) can be assumed OLS has some nice properties that make it attractive to use.
(I) MINIMUM VARIANCE AMONG UNBIASED ESTIMATORS
(II) MAXIMUM LIKELIHOOD
(III) CONSISTENT AND ASYMPTOTICALLY NORMALITY AND EFFICIENCY UNDER CERTAIN REGULARITY CONDITIONS
Under B) OLS can be used for both estimation and predictions and both confidence and prediction intervals can be generated for the fitted values and predictions.
IF only A) holds we still have property (I) but not (II) or (III). If your objective is to fit the model and you don't need confidence or prediction interval for the repsonse given the covariate and you don't need confidence intervals for the regression parameters then OLS can be used under A). But you cannot test for significance of the coefficients in the model using the t tests that are often used nor can you apply the F test for overall model fit or the one for equality of variances. But the Gauss-Markov theorem tells you that property I still holds. However in case A) since (II) and (III) no longer hold other more robust estimation procedures may be better than least squares even though they are not unbiased. This is particularly true when the error distribution is heavytailed and you see outliers in the data. The least squares estimates are very sensitive to outliers.
What else can go wrong with using OLS?
Error variances not homogeneous means a weighted least squares method may be preferable to OLS.
High degree of collinearity among predictors means that either some predictors should be removed or another estimation procedure such as ridge regression should be used. The OLS estimated coefficients can be highly unstable when there is a high degree of multicollinearity.
If the covariates are observed with error (e.g. measurement error) then the model assumption that the covariates are given without error is violated. This is bad for OLS because the criteria minimizes the residuals in the direction of the response variable assuming no error to worry about in the direction of the covariates. This is called the error in variables problem and a solution that takes account of these errors in the covariate directions will do better. Error in variables (aka Deming) regression minimizes the sum of squared deviations in a direction that takes account of the ratios of these variances.
This is a little complicated because many assumptions are involved in these models and objectives play a role in deciding which assumptions are crucial for a given analysis. But if you focus on the properties one at a time to see the consequences of the violation of an assumption it might be less confusing.
I am not aware of any case when there is only measurement error in the independent variables and not in the response, but rather error is in both. In this sense what I mean is we have observations $$(W_i,Y_i), i=1,...,n$$
where we have that $Y_i = g(X_i) + V_i$ and $W_i = X_i + U_i$. $V_i$ is the standard error that you get from regression modelling, and $U_i$ is your measurement error. There is a lot of literature in the standard univariate case, but if you understand the univariate case then the extension to the multivariate case should follow. Admittedly, there is not a lot literature covering the multivariate case, because difficulties presented in the 1 - dimensional case are usually difficult enough.
For starters, Fan (1991) explains that the difficulty in the measurement error problem stems from the errors $U_i$. The ability to come up with "good" estimators depends on these errors. In particular he describes the $U_i$'s as being either 'super smooth' or 'ordinary smooth'.
The model above is known as the Classical Errors in Variables (Additive) since we observe 'naturally' a mismeasured version of the true covariate and the error is additive. As I have just suggested, there are many forms of the errors in variables model, but I suspect you are talking about the classical one. The standard distributional assumptions are:
$V_i$ is independent of $X_i$. $X_i$ is independent of $U_i$. Either the density of $U_i$ is already known (and is either supersmooth or ordinary smooth), or there are repeated measurements of $X_i$ measured with error (namely $W_{ij} = X_i + U_{ij}$). The dependence of $Y_i$ on the predictors is non differential i.e $Y_i|X_i,W_i ~ Y_i|X_i$ that is, if you know what the true covariate is, then you don't need the measurement error form anymore.
As far as I know there aren't any standard assumptions used in the multivariate case.
Best Answer
If we go for a simple answer, the excerpt from the Wooldridge book (page 533) is very appropriate:
... both heteroskedasticity and nonnormality result in the Tobit estimator $\hat{\beta}$ being inconsistent for $\beta$. This inconsistency occurs because the derived density of $y$ given $x$ hinges crucially on $y^*|x\sim\mathrm{Normal}(x\beta,\sigma^2)$. This nonrobustness of the Tobit estimator shows that data censoring can be very costly: in the absence of censoring ($y=y^*$) $\beta$ could be consistently estimated under $E(u|x)=0$ [or even $E(x'u)=0$].
The notations in this excerpt comes from Tobit model:
\begin{align} y^{*}&=x\beta+u, \quad u|x\sim N(0,\sigma^2)\\ y^{*}&=\max(y^*,0) \end{align} where $y$ and $x$ are observed.
To sum up the difference between least squares and Tobit regression is the inherent assumption of normality in the latter.
Also I always thought that the original article of Amemyia was quite nice in laying out the theoretical foundations of the Tobit regression.