Usually you predict a dependent variable y and calculate a confidence interval, i.e. given x0, you calculate [y-, y+] where y will probably lie in.
For the reverse, if you have a y0 and want to find [x-, x+] for whatever reasons, regression will not help.
The appropriate tool for this kind of analysis could be structural equations modeling [1]
[1] http://en.wikipedia.org/wiki/Structural_equation_modeling
Well, I think Mike McCoy's answer is "the right answer," but here's another way of thinking about it: the linear regression is looking for an approximation (up to the error $\epsilon$) for $y$ as a function of $x$. That is, we're given a non-noisy $x$ value, and from it we're computing a $y$ value, possibly with some noise. This situation is not symmetric in the variables -- in particular, flipping $x$ and $y$ means that the error is now in the independent variable, while our dependent variable is measured exactly.
One could, of course, find the equation of the line that minimizes the sum of the squares of the (perpendicular) distances from the data points. My guess is that the reason that this isn't done is related to my first paragraph and "physical" interpretations in which one of the variables is treated as dependent on the other.
Incidentally, it's not hard to think up silly examples for which $B_x$ and $B_y$ don't satisfy anything remotely like $B_x \cdot B_y = 1$. The first one that pops to mind is to consider the least-squares line for the points {(0, 1), (1, 0), (-1, 0), (0, -1)}. (Or fudge the positions of those points slightly to make it a shade less artificial.)
Another possible reason that the perpendicular distances method is nonstandard is that it doesn't guarantee a unique solution -- see for example the silly example in the preceding paragraph.
(N.B.: I don't actually know anything about statistics.)
Best Answer
I might suggest LTS, the Least Trimmed Squares, approach. there is code in fortran and matlab, the latter called fastlts, both produced, I believe, by Rousseuw's group. The method essentially minimizes the error of fit for a proportion of the data points, with the rest (outliers) ignored. The outliers are found by something like the Minimum Volume Ellipsoid method (roughly, find the ellipsoid of minimum volume containing 1/2 the points).
hth,