An answer to all four of your questions, preceeded by a note:
It's not actually all that common for modern epidemiology studies to report an odds ratio from a logistic regression for a cohort study. It remains the regression technique of choice for case-control studies, but more sophisticated techniques are now the de facto standard for analysis in major epidemiology journals like Epidemiology, AJE or IJE. There will be a greater tendency for them to show up in clinical journals reporting the results of observational studies. There's also going to be some problems because Poisson regression can be used in two contexts: What you're referring to, wherein it's a substitute for a binomial regression model, and in a time-to-event context, which is extremely common for cohort studies. More details in the particular question answers:
For a cohort study, not really no. There are some extremely specific cases where say, a piecewise logistic model may have been used, but these are outliers. The whole point of a cohort study is that you can directly measure the relative risk, or many related measures, and don't have to rely on an odds ratio. I will however make two notes: A Poisson regression is estimating often a rate, not a risk, and thus the effect estimate from it will often be noted as a rate ratio (mainly, in my mind, so you can still abbreviate it RR) or an incidence density ratio (IRR or IDR). So make sure in your search you're actually looking for the right terms: there are many cohort studies using survival analysis methods. For these studies, Poisson regression makes some assumptions that are problematic, notably that the hazard is constant. As such it is much more common to analyze a cohort study using Cox proportional hazards models, rather than Poisson models, and report the ensuing hazard ratio (HR). If pressed to name a "default" method with which to analyze a cohort, I'd say epidemiology is actually dominated by the Cox model. This has its own problems, and some very good epidemiologists would like to change it, but there it is.
There are two things I might attribute the infrequency to - an infrequency I don't necessarily think exists to the extent you suggest. One is that yes - "epidemiology" as a field isn't exactly closed, and you get huge numbers of papers from clinicians, social scientists, etc. as well as epidemiologists of varying statistical backgrounds. The logistic model is commonly taught, and in my experience many researchers will turn to the familiar tool over the better tool.
The second is actually a question of what you mean by "cohort" study. Something like the Cox model, or a Poisson model, needs an actual estimate of person-time. It's possible to get a cohort study that follows a somewhat closed population for a particular period - especially in early "Intro to Epi" examples, where survival methods like Poisson or Cox models aren't so useful. The logistic model can be used to estimate an odds ratio that, with sufficiently low disease prevalence, approximates a relative risk. Other regression techniques that directly estimate it, like binomial regression, have convergence issues that can easily derail a new student. Keep in mind the Zou papers you cite are both using a Poisson regression technique to get around the convergence issues of binomial regression. But binomial-appropriate cohort studies are actually a small slice of the "cohort study pie".
Yes. Frankly, survival analysis methods should come up earlier than they often do. My pet theory is that the reason this isn't so is that methods like logistic regression are easier to code. Techniques that are easier to code, but come with much larger caveats about the validity of their effect estimates, are taught as the "basic" standard, which is a problem.
You should be encouraging students and colleagues to use the appropriate tool. Generally for the field, I think you'd probably be better off suggesting a consideration of the Cox model over a Poisson regression, as most reviewers would (and should) swiftly bring up concerns about the assumption of a constant hazard. But yes, the sooner you can get them away from "How do I shoehorn my question into a logistic regression model?" the better off we'll all be. But yes, if you're looking at a study without time, students should be introduced to both binomial regression, and alternative approaches, like Poisson regression, which can be used in case of convergence problems.
This is a variation of the selection model in econometrics. The validity of the estimates
using only the selected sample here depends on the condition that
$\Pr\left(Y_{i}=1\mid X_{i},D_{i}=1\right)=\Pr\left(Y_{i}=1\mid X_{i},D_{i}=0\right)$. Here $D_i$ is $i$'s disease status.
To give more details, define the following notations: $\pi_{1}=\Pr\left(D_{i}=1\right)$
and $\pi_{0}=\Pr\left(D_{i}=0\right)$; $S_{i}=1$ refers to the event
that $i$ is in the sample. Moreover, assume $D_{i}$ is independent
of $X_{i}$ for simplicity.
The probability of $Y_{i}=1$ for a unit $i$ in the sample is
\begin{eqnarray*}
\Pr\left(Y_{i}=1\mid X_{i},S_{i}=1\right) & = & \mathrm{{E}}\left(Y_{i}\mid X_{i},S_{i}=1\right)\\
& = & \mathrm{{E}}\left\{ \mathrm{{E}}\left(Y_{i}\mid X_{i},D_{i},S_{i}=1\right)\mid X_{i},S_{i}=1\right\} \\
& = & \Pr\left(D_{i}=1\mid S_{i}=1\right)\Pr\left(Y_{i}=1\mid X_{i},D_{i}=1,S_{i}=1\right)+\\
& & \Pr\left(D_{i}=0\mid S_{i}=1\right)\Pr\left(Y_{i}=1\mid X_{i},D_{i}=0,S_{i}=1\right),
\end{eqnarray*}
by the law of iterated expecation. Suppose conditional on the disease
status $D_{i}$ and other covariates $X_{i}$, the outcome $Y_{i}$
is independent of $S_{i}$. As a result, we have
\begin{eqnarray*}
\Pr\left(Y_{i}=1\mid X_{i},S_{i}=1\right) & = & \Pr\left(D_{i}=1\mid S_{i}=1\right)\Pr\left(Y_{i}=1\mid X_{i},D_{i}=1\right)+\\
& & \Pr\left(D_{i}=0\mid S_{i}=1\right)\Pr\left(Y_{i}=1\mid X_{i},D_{i}=0\right).
\end{eqnarray*}
It is easy to see that
$$
\Pr\left(D_{i}=1\mid S_{i}=1\right)=\frac{\pi_{1}p_{i1}}{\pi_{1}p_{i1}+\pi_{0}p_{i0}}\mbox{ and }\Pr\left(D_{i}=0\mid S_{i}=1\right)=\frac{\pi_{0}p_{i0}}{\pi_{1}p_{i1}+\pi_{0}p_{i0}}.
$$
Here $p_{i1}$ and $p_{i0}$ are as defined your sampling scheme.
Thus,
$$
\Pr\left(Y_{i}=1\mid X_{i},S_{i}=1\right)=\frac{\pi_{1}p_{i1}}{\pi_{1}p_{i1}+\pi_{0}p_{i0}}\Pr\left(Y_{i}=1\mid X_{i},D_{i}=1\right)+\frac{\pi_{0}p_{i0}}{\pi_{1}p_{i1}+\pi_{0}p_{i0}}\Pr\left(Y_{i}=1\mid X_{i},D_{i}=0\right).
$$
If $ $$\Pr\left(Y_{i}=1\mid X_{i},D_{i}=1\right)=\Pr\left(Y_{i}=1\mid X_{i},D_{i}=0\right)$,
we have
$$
\Pr\left(Y_{i}=1\mid X_{i},S_{i}=1\right)=\Pr\left(Y_{i}=1\mid X_{i}\right),
$$
and you can omit the sample selection problem. On the other hand,
if $\Pr\left(Y_{i}=1\mid X_{i},D_{i}=1\right)\neq\Pr\left(Y_{i}=1\mid X_{i},D_{i}=0\right)$,
$$
\Pr\left(Y_{i}=1\mid X_{i},S_{i}=1\right)\neq\Pr\left(Y_{i}=1\mid X_{i}\right)
$$
in general. As a particular case, consider the logit model,
$$
\Pr\left(Y_{i}=1\mid X_{i},D_{i}=1\right)=\frac{e^{X_{i}'\alpha}}{1+e^{X_{i}'\alpha}}\mbox{ and }\Pr\left(Y_{i}=1\mid X_{i},D_{i}=0\right)=\frac{e^{X_{i}'\beta}}{1+e^{X_{i}'\beta}}.
$$
Even when $p_{i1}$ and $p_{i0}$ are constant across $i$, the resulted
distribution will not keep the logit formation. More importantly,
the intepretations of the parameters would be totally different. Hopefully,
the above arguments help to clarify your problem a little bit.
It is tempted to include $D_{i}$ as an additional explanatory variable,
and estimate the model based on $\Pr\left(Y_{i}\mid X_{i},D_{i}\right)$.
To justify the validity of using $\Pr\left(Y_{i}\mid X_{i},D_{i}\right)$,
we need to prove that $\Pr\left(Y_{i}\mid X_{i},D_{i},S_{i}=1\right)=\Pr\left(Y_{i}\mid X_{i},D_{i}\right)$,
which is equivalent to the condition that $D_{i}$ is a sufficient
statistic of $S_{i}$. Without further information about your sampling
process, I am not sure if it is true. Let's use an abstract notation.
The observability variable $S_{i}$ can be viewed as random function
of $D_{i}$ and the other random variables, say $\mathbf{Z}_{i}$.
Denote $S_{i}=S\left(D_{i},\mathbf{Z}_{i}\right)$. If $\mathbf{Z}_{i}$
is independent of $Y_{i}$ conditional on $X_{i}$ and $D_{i}$, we
have $\Pr\left(Y_{i}\mid X_{i},D_{i},S\left(D_{i},\mathbf{Z}_{i}\right)\right)=\Pr\left(Y_{i}\mid X_{i},D_{i}\right)$
by the definition of independence. However, if $\mathbf{Z}_{i}$ is
not independent of $Y_{i}$ after conditioning on $X_{i}$ and $D_{i}$,
$\mathbf{Z}_{i}$ intuitively contains some relevant information about
$Y_{i}$, and in general it is not expected that $\Pr\left(Y_{i}\mid X_{i},D_{i},S\left(D_{i},\mathbf{Z}_{i}\right)\right)=\Pr\left(Y_{i}\mid X_{i},D_{i}\right)$.
Thus, in the 'however' case, the ignorance of sample selection
could be misleading for inference. I am not very familiar with the
sample selection literature in econometrics. I would recommend Chapter
16 of Microeconometrics: methods and applications' by Cameron
and Trivedi (especially the Roy model in that chapter). Also G. S.
Maddala's classic book
Limited-dependent and qualitative variables
in econometrics' is a systematic treatment of the issues about sample
selection and discrete outcomes.
Best Answer
First, the definitions, then a slight twist on the statement you posted, then hopefully an illuminating answer.
Cross-Sectional Study: A study where you take a "snapshot" of a population at a single point in time. You're not following anyone, it's simply a "At this point, do you have or not have a disease" - along with covariates of course. A cross-section - hence the name.
Case-Control Study: A study usually used when a cohort study or RCT is going to be difficult, if not impossible. You sample cases from some source, and then a number of controls, usually in some ratio to the number of cases (1:1, 2:1, etc.). Again, you're not following anyone, you're back tracking. Rather than saying "what exposures lead to disease" you're asking "what exposures are more common in the group that got disease?".
What the statement means is that in either case, you're limited to what you can estimate. In order to calculate a risk (and thus a risk ratio) you need to know of a population n with no diseased people, how many people would get disease in your follow-up period (incidence). In a cross-sectional study, you technically only have prevalence, not incidence. This is the twist - the statement you posted is technically wrong. You can also - and often should - estimate a Prevalence Ratio from a cross-section study, as well as an Odds Ratio.
In a case-control study, you don't have the population - you just have the cases, and a basket of non-cases - you have no idea what happened in population n. So while you can calculate odds, its literally impossible to calculate the risk, it requires information you do not have.
However, in cases where disease is rare (~<10% prevalence), the Odds Ratio should approximate the risk ratio for a similarly conducted cohort study.
What this all means statistically is that these relatively simplistic (and thus fairly flexible) study designs are somewhat restrictive in what you can do - you're largely confined to logistic regression and the calculation of an odds ratio.