This is a variation of the selection model in econometrics. The validity of the estimates
using only the selected sample here depends on the condition that
$\Pr\left(Y_{i}=1\mid X_{i},D_{i}=1\right)=\Pr\left(Y_{i}=1\mid X_{i},D_{i}=0\right)$. Here $D_i$ is $i$'s disease status.
To give more details, define the following notations: $\pi_{1}=\Pr\left(D_{i}=1\right)$
and $\pi_{0}=\Pr\left(D_{i}=0\right)$; $S_{i}=1$ refers to the event
that $i$ is in the sample. Moreover, assume $D_{i}$ is independent
of $X_{i}$ for simplicity.
The probability of $Y_{i}=1$ for a unit $i$ in the sample is
\begin{eqnarray*}
\Pr\left(Y_{i}=1\mid X_{i},S_{i}=1\right) & = & \mathrm{{E}}\left(Y_{i}\mid X_{i},S_{i}=1\right)\\
& = & \mathrm{{E}}\left\{ \mathrm{{E}}\left(Y_{i}\mid X_{i},D_{i},S_{i}=1\right)\mid X_{i},S_{i}=1\right\} \\
& = & \Pr\left(D_{i}=1\mid S_{i}=1\right)\Pr\left(Y_{i}=1\mid X_{i},D_{i}=1,S_{i}=1\right)+\\
& & \Pr\left(D_{i}=0\mid S_{i}=1\right)\Pr\left(Y_{i}=1\mid X_{i},D_{i}=0,S_{i}=1\right),
\end{eqnarray*}
by the law of iterated expecation. Suppose conditional on the disease
status $D_{i}$ and other covariates $X_{i}$, the outcome $Y_{i}$
is independent of $S_{i}$. As a result, we have
\begin{eqnarray*}
\Pr\left(Y_{i}=1\mid X_{i},S_{i}=1\right) & = & \Pr\left(D_{i}=1\mid S_{i}=1\right)\Pr\left(Y_{i}=1\mid X_{i},D_{i}=1\right)+\\
& & \Pr\left(D_{i}=0\mid S_{i}=1\right)\Pr\left(Y_{i}=1\mid X_{i},D_{i}=0\right).
\end{eqnarray*}
It is easy to see that
$$
\Pr\left(D_{i}=1\mid S_{i}=1\right)=\frac{\pi_{1}p_{i1}}{\pi_{1}p_{i1}+\pi_{0}p_{i0}}\mbox{ and }\Pr\left(D_{i}=0\mid S_{i}=1\right)=\frac{\pi_{0}p_{i0}}{\pi_{1}p_{i1}+\pi_{0}p_{i0}}.
$$
Here $p_{i1}$ and $p_{i0}$ are as defined your sampling scheme.
Thus,
$$
\Pr\left(Y_{i}=1\mid X_{i},S_{i}=1\right)=\frac{\pi_{1}p_{i1}}{\pi_{1}p_{i1}+\pi_{0}p_{i0}}\Pr\left(Y_{i}=1\mid X_{i},D_{i}=1\right)+\frac{\pi_{0}p_{i0}}{\pi_{1}p_{i1}+\pi_{0}p_{i0}}\Pr\left(Y_{i}=1\mid X_{i},D_{i}=0\right).
$$
If $ $$\Pr\left(Y_{i}=1\mid X_{i},D_{i}=1\right)=\Pr\left(Y_{i}=1\mid X_{i},D_{i}=0\right)$,
we have
$$
\Pr\left(Y_{i}=1\mid X_{i},S_{i}=1\right)=\Pr\left(Y_{i}=1\mid X_{i}\right),
$$
and you can omit the sample selection problem. On the other hand,
if $\Pr\left(Y_{i}=1\mid X_{i},D_{i}=1\right)\neq\Pr\left(Y_{i}=1\mid X_{i},D_{i}=0\right)$,
$$
\Pr\left(Y_{i}=1\mid X_{i},S_{i}=1\right)\neq\Pr\left(Y_{i}=1\mid X_{i}\right)
$$
in general. As a particular case, consider the logit model,
$$
\Pr\left(Y_{i}=1\mid X_{i},D_{i}=1\right)=\frac{e^{X_{i}'\alpha}}{1+e^{X_{i}'\alpha}}\mbox{ and }\Pr\left(Y_{i}=1\mid X_{i},D_{i}=0\right)=\frac{e^{X_{i}'\beta}}{1+e^{X_{i}'\beta}}.
$$
Even when $p_{i1}$ and $p_{i0}$ are constant across $i$, the resulted
distribution will not keep the logit formation. More importantly,
the intepretations of the parameters would be totally different. Hopefully,
the above arguments help to clarify your problem a little bit.
It is tempted to include $D_{i}$ as an additional explanatory variable,
and estimate the model based on $\Pr\left(Y_{i}\mid X_{i},D_{i}\right)$.
To justify the validity of using $\Pr\left(Y_{i}\mid X_{i},D_{i}\right)$,
we need to prove that $\Pr\left(Y_{i}\mid X_{i},D_{i},S_{i}=1\right)=\Pr\left(Y_{i}\mid X_{i},D_{i}\right)$,
which is equivalent to the condition that $D_{i}$ is a sufficient
statistic of $S_{i}$. Without further information about your sampling
process, I am not sure if it is true. Let's use an abstract notation.
The observability variable $S_{i}$ can be viewed as random function
of $D_{i}$ and the other random variables, say $\mathbf{Z}_{i}$.
Denote $S_{i}=S\left(D_{i},\mathbf{Z}_{i}\right)$. If $\mathbf{Z}_{i}$
is independent of $Y_{i}$ conditional on $X_{i}$ and $D_{i}$, we
have $\Pr\left(Y_{i}\mid X_{i},D_{i},S\left(D_{i},\mathbf{Z}_{i}\right)\right)=\Pr\left(Y_{i}\mid X_{i},D_{i}\right)$
by the definition of independence. However, if $\mathbf{Z}_{i}$ is
not independent of $Y_{i}$ after conditioning on $X_{i}$ and $D_{i}$,
$\mathbf{Z}_{i}$ intuitively contains some relevant information about
$Y_{i}$, and in general it is not expected that $\Pr\left(Y_{i}\mid X_{i},D_{i},S\left(D_{i},\mathbf{Z}_{i}\right)\right)=\Pr\left(Y_{i}\mid X_{i},D_{i}\right)$.
Thus, in the 'however' case, the ignorance of sample selection
could be misleading for inference. I am not very familiar with the
sample selection literature in econometrics. I would recommend Chapter
16 of Microeconometrics: methods and applications' by Cameron
and Trivedi (especially the Roy model in that chapter). Also G. S.
Maddala's classic book
Limited-dependent and qualitative variables
in econometrics' is a systematic treatment of the issues about sample
selection and discrete outcomes.
Best Answer
No estimate is biased per se, an estimate can only be a biased estimate of something and specifying that something is crucial. In this case, the constant is a biased estimate of the log odds of y=1 in the population when all the explanatory variables are 0. In a case control study you have obviously lost that information by the way you designed that study. Since the constant does not measure what you want it to measure, we call it biased.
Don't be alarmed, though. The word bias sounds bad, but the purpose of a case control study is not to estimate the odds of y=1 in the population, so this bias is irrelevant.
response to comment:
The rare events logic method proposed by Gary King won't help in your case, as it solves a different problem. With a case-control study you cannot estimate the probability of an event, as you did not collect that information in the first place. No method can extract information that is not present in the data.
Consider a simpler problem where you have no explanatory variables. What would you need to estimate the proportion of y=1? You draw a random sample from your observation and compute the proportion of those observations with y=1. In a case control study you start with a number of observations with y=1 and find for each one or more matches with y=0. So the proportion of observations with y=1 in your data only tells you something about your design, but nothing about your population. If you collected 1 control per case then the proportion of cases in your data will be .5, if you collected two controls per case the proportion will be 0.333, etc. This proportion in your data says nothing about how common cases are in the population. This information was never collected, and there is thus no way to recover it from a case control study.