To clarify what Peter Flom's point is, if you have normal residuals in a regression model and the model is adequate the DV or response variable y will be normally distributed but with mean equal to the regression function ax+b where x is your IV. How x is distributed depends on your design. If you do a histogram of the ys it doesn't tell you anything useful becuase it is just a mixing of normal distributions with different mean values. Histograms of the estimated residuals and qq plots of the residuals can help you determine whether or not the normality assumption is violated to the extent that you need to do something about it. Transformations that make the residuals look more like thay are normally distributed is one way to deal with the problem if you have it. But the are alternatives that I think are better. Robust regression and bootstrap are two such alternatives that I prefer.
Now Peter is right. Your residual histogram looks reasonably normal, so there is probably no need for a transformation or any other change in the model or the fitting procedure.
The simplest way to solve your immediate problem, with most of your data fitting simple linear regression well except for data from one depth, is to separate out the issue of the model itself from that of the display of the model results. For the one depth that required a transformation of variables, back-transform the regression fit into the original scale before plotting. For that one depth you will have a curve rather than the straight lines that characterized the other depths, but you should still have a useful x-intercept and the slope of the curve near that intercept will be a start for comparisons of slopes among depths.
You should, however, consider why this particular depth seems to have such different properties from the other depths. Is it an extreme of depth values, perhaps beyond some type of boundary (with respect to temperature, mixing, etc) versus the other depths? Or might it just be that the measurements at that particular depth had some systematic errors, in which case you shouldn't be considering them at all? Such scientific and technical issues are much more important than the details of the statistical approaches.
For the broader issues raised in your question, the assumptions underlying linear models are discussed extensively on this site, for example here. Linearity of outcome with respect to the predictor variables is important, but other assumptions like normal distributions of errors mainly affect the ability to interpret p-values. If there is linearity with respect to predictor variables, the regression will still give a useful estimate of the underlying relation. Generalized linear models provide a way to deal with errors that are a function of the predicted value, as you seem to have for that one troubling depth.
Note that your experimental design, if it is an observational study based on concentrations of chemicals measured at different depths, already violates one of the assumptions of standard linear regression, as there presumably are errors in the values of the predictor variables. What you really have in that case is an error-in-variables model. In practice that distinction is often overlooked, but your regression models (and those of most scientists engaged in observational rather than controlled studies) already violate strict linear regression assumptions.
Finally, although I appreciate that you have already done much data analysis, consider whether you really should use concentration ratios as predictor variables. Ratios are notoriously troublesome, particularly if a denominator can be close to 0. Almost anything that can be accomplished with ratios as predictors can be done with log transformations of the numerator and denominator variables. As I understand your situation, you have a single outcome variable (rate of production of some chemical) and multiple measured concentrations of other chemicals; you then examined various ratios of those other chemicals as predictors for the outcome variable. If you instead formed a combined regression model that used the log concentrations of all the other chemicals as the predictors of the outcome, you might end up with a more useful model, which may show unexpected interactions among the chemicals and still can be interpreted in terms of ratios if you wish.
Best Answer
If you expect the relationship between y and the x's to be linear, then a nonlinear transformation of y will make the relationship between it and the x's nonlinear. It will also alter the spread about the model (if the data had constant variance before transformation, it won't have it afterward).
Note further that in regression, there's no assumption about the distribution of the dependent variable itself (unconditionally). That is, there's little value in looking at say a histogram of the $y$ values -- it doesn't directly relate to any regression assumption. The assumption of normality applies when you're using normal based tests or intervals, and applies to the conditional distribution, which you can't usually assess until you look at residuals.
If you're not interested in hypothesis tests or confidence intervals, an ordinary regression with non-normal conditional distribution may in some situations be reasonable (non-constant variance may be more of an issue than distribution-shape anyway). If you do want to perform inference as well, there are several ways of going about it (some approximate) that may be suitable.
If you thought that the conditional distribution $Y|x1,x2,...$ was distributed as exponential, and that the relationship between $Y$ and the $x$'s was linear, you could use a GLM with identity link. There's advice relating to fitting exponential models in this way on site.
The distribution of the independent variables doesn't matter, since you condition on them in regression. No assumption about their distribution is made. The only way it's relevant is that sometimes the joint distribution can help inform us how to interpret the marginal distribution of the dependent variable, y (e.g. jointly normal x's would not produce an exponential y from conditionally normal y, so it would lead us to doubt the y's were conditionally normal).
If it makes sense to model $E(\log(y))$ as a linear function of the predictors, that may be fine, but note that if you exponentiate such a fit, you don't get a suitable estimate of $E(Y|X=x)$ out (unless there's almost no variation about the model, in which case the bias may sometimes be small enough to ignore). An alternative to that would be to use a GLM with log link (in which case you'd be modelling $\log(E(y))$ as a linear function of parameters -- and expected values do come straight out of that model.
You should consider the spread about the relationship; if you know something about that it already it may help inform your choice of model (but beware your inferences if you're using the same data to identify the model as to make inferences about it)
There are many alternative ways than least squares to fit linear relationships, and some might be more suitable in the case of some non-normal conditional distributions.
You should clarify your expectations about what it is that will be linearly related to the x's and how you understand the variability about the line would behave (say as a function of the mean for example -- would it tend to spread more as the mean increased, or not?) on whatever that scale is.