Will the estimated slope coefficient $\beta_1$ always be the same for OLS and for QR for different quantiles?
No, of course not, because the empirical loss function being minimized differs in these different cases (OLS vs. QR for different quantiles).
I am well aware that in the presence of homoscedasticity all the slope parameters for different quantile regressions will be the same and that the QR models will differ only in the intercept.
No, not in finite samples. Here is an example taken from the help files of the quantreg
package in R:
library(quantreg)
data(stackloss)
rq(stack.loss ~ stack.x,tau=0.50) #median (l1) regression fit
# for the stackloss data.
rq(stack.loss ~ stack.x,tau=0.25) #the 1st quartile
However, asymptotically they will all converge to the same true value.
But in the case of homoscedasticity, shouldn't outliers cancel each other out because positive errors are as likely as negative ones, rendering OLS and median QR slope coefficient equivalent?
No. First, perfect symmetry of errors is not guaranteed in any finite sample. Second, minimizing the sum of squares vs. absolute values will in general lead to different values even for symmetric errors.
We need to take some care with the notation because the models differ.
Let the first (correct) model be
$$Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \varepsilon\tag{1}$$
where the $\varepsilon_i$ have a common variance and zero means; and write the second model (which governs the very same variables $Y$, so no need to change their name) as
$$Y = \alpha_0 + \alpha_1 X_1 + \delta.\tag{2}$$
As an aside, we may impose no additional assumptions on $\delta$ because these random variables are completely determined by equating the two right hand sides (which, after all, equal the same things):
$$\delta = (\beta_0 - \alpha_0) + (\beta_1 - \alpha_1)X_1 + \beta_2 X_2 + \varepsilon.$$
(From now on I will drop the generic discussion of models to focus on a dataset with explanatory values $x_{1i}$ and $x_{2i},$ responses $y_i,$ and associated error $\varepsilon_i$ and $\delta_i.$)
We can infer, however, that the $\delta_i$ all have the same variances as the $\varepsilon$ and their means are
$$E[\delta_i] = (\beta_0 - \alpha_0) + (\beta_1 - \alpha_1)x_{1i} + \beta_2 x_{2i},$$
which may vary among observations.
Let's return to the analysis.
Fitting the second model gives the slope estimate
$$\hat\alpha_1 = \frac{\sum_{i} (y_i - \bar y)(x_{1i} - \bar{x}_1)}{\sum_{i} (x_{1i} - \bar{x}_1)^2}.\tag{*}$$
This is a linear combination of the $y_i-\bar y,$ so use the zero-mean assumption about the $\varepsilon_i$ to compute
$$E[y_i - \bar y] = (\beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i}) -(\beta_0 + \beta_1 \bar{x}_1 + \beta_2 \bar{x}_2) = \beta_1(x_{1i}-\bar{x}_i) + \beta_2(x_{2i} - \bar{x}_2)$$
and apply linearity of expectation in $(*)$ to compute
$$E[\hat\alpha_1] = \beta_1 + \beta_2\frac{\sum_{i} (x_{2i}-\bar{x}_2)(x_{1i} - \bar{x}_1)}{\sum_{i} (x_{1i} - \bar{x}_1)^2}.$$
Equating this with $\beta_1$ to assess the bias in using $\hat\alpha_1$ to estimate $\beta_1,$ we find it will be unbiased if and only if the second term is zero. This can happen in two ways:
If $\beta_2 = 0.$ (This just means the second model is correct.)
If $\sum_{i} (x_{2i}-\bar{x}_2)(x_{1i} - \bar{x}_1)=0.$ This means the covariance of the $x_1$ data and the $x_2$ data is zero: that is, the design vectors are orthogonal.
If neither of these is the case, the bias is nonzero. That agrees exactly with your intuition.
Best Answer
Let me try it as follows (really not sure if that is useful intuition):
Based on my above comment, the correlation will roughly be $$-\frac{E(X)}{\sqrt{E(X^2)}}$$ Thus, if $E(X)>0$ instead of $E(X)=0$, most data will be clustered to the right of zero. Thus, if the slope coefficient gets larger, the correlation formula asserts that the intercept needs to become smaller - which makes some sense.
I'm thinking of something like this:
In the blue sample, the slope estimate is flatter, which means the intercept estimate can be larger. The slope for the golden sample is somewhat larger, so the intercept can be somewhat smaller to compensate for this.
On the other hand, if $E(X)=0$, we can have any slope without any constraints on the intercept.
The denominator of the formula can also be interpreted along these lines: if, for a given mean, the variability as measured by $E(X^2)$ increases, the data gets smeared out over the $x$-axis, so that it effectively "looks" more mean-zero again, loosening the constraints on the intercept for a given mean of $X$.
Here's the code, which I hope explains the figure fully: