Estimating the baseline hazard (covariate=0) would be the same as estimating the Kaplan-Meier curve for the covariate = 0 group, would it not?
No, and I think that this is the source of your general confusion.
With a Cox proportional hazards model, you implicitly assume that there is a single baseline cumulative hazard $H_0(t)$ that applies to all cases, with the covariates $z$ and their regression coefficients $\beta$ leading to a covariate-specific hazard $H(t;z) = H_0(t)e^{\beta z}$. As explained for example on this page, the Breslow-Aalen estimate of the baseline hazard is made only after you have fit the Cox model and gotten the regression coefficients.
Thus a Cox model forces all cases to have hazards over time of the same general shape, with their steepness determined by the covariate values and the associated hazard ratios.
So even with a binary predictor in a Cox model, the baseline hazard will not coincide with the corresponding non-parametric estimator* for the covariate = 0
group. It will be some sort of compromise in the shapes of the curves, and it will show steps down at all event times, not just at the event times of the covariate = 0
group.
*Note that the Breslow and Kaplan-Meier non-parametric survival-cure estimates are determined differently. See a survival analysis text like Therneau and Grambsch, or this web page.
Censoring is built into survival models by incorporating it into the likelihood function underlying the analysis. The most common form of censoring occurs when we observe an item for a finite period of time $T$ and it does not fail in that time. Below I will show you how the censoring is built into the likelihood function and how this affects the Cox proportional hazards model.
Incorporating censored data into the likelihood function: As a common example, suppose we have items where the time-to-failure has a survival function $S$ and corresponding density function $f$, both of which area parameterised by some parameter $\theta$. If an item $i$ is observed to fail at time $0 \leqslant t_i \leqslant T$ then it is incorporated into the likelihood function using the density term:
$$f(t_i | \theta).$$
However, if an item $i$ is observed throughout the whole time $T$ and it does not fail then this is considered to be a "right-censored" data point (only known to fail at some time after $G$) and it is incorporated into the likelihoood function using the survival term:
$$S(T|\theta).$$
Suppose we have a survival model based on observation for a fixed period of length $T$, where the times-to-failure for each observation are IID conditional on some underlying parameters. Without further loss of generality, we will have $n$ observed failures at times $t_1,...,t_n$ (all within the interval $[0,T]$) and we will will have $m$ right-censored values that did not fail in the oberved time $T$. The overall likelihood function for this data is then given by:
$$L_\mathbf{t}(\theta) = \bigg( \prod_{i=1}^n f(t_i|\theta) \bigg) \times S(T|\theta)^m.$$
In this likelihood function you can see that the censoring of data is "built in" by the fact that right-censored values are incorporated through their survival function instead of the density function for the time-to-failure.
Extending to the Cox proportional hazards model: The Cox proportional hazards model still uses a likelihood function for the observed times-to-failure and survival times, but it now adds covariates to the data and uses an assumption of proportional hazards in how these manifest in the hazard function. This does not change the underlying method of how censored values are built into the likelihood function --- e.g., right-censored values still enter through their survival function instead of the density of the time-to-failure.
Extension to other kinds of censoring: The above shows the common case where we have right-censored observations with the same censoring time $T$. Of course, this is not the only kind of censorship that can occur. Another possibility is that we might observe items up to different end-times, in which case the right-censored values would occur with different observation periods $t_{n+1},...,t_{n+m}$. In this case the likelihood function would be generalised to:
$$L_\mathbf{t}(\theta) = \bigg( \prod_{i=1}^n f(t_i|\theta) \bigg) \times \bigg( \prod_{i=1}^m S(t_{n+i}|\theta) \bigg).$$
Another possibility (which is uncommon in survival analysis) is left-censorship, where we know that an item failed no later than some time $T_i$. Left-censored observations enter into the likelihood function through the cumulative distribution function $F$. If we extend our model to assume that we have $r$ left-censored observations with observation times $t_{n+m+i},...,t_{n+m+r}$ then the likelihood function would be further generalised to:
$$L_\mathbf{t}(\theta) = \bigg( \prod_{i=1}^n f(t_i|\theta) \bigg) \times \bigg( \prod_{i=1}^m S(t_{n+i}|\theta) \bigg) \times \bigg( \prod_{i=1}^r F(t_{n+m+i}|\theta) \bigg).$$
And of course, you can extend this event further to allow for more complicated kinds of censorship. In general, if a censored observation is known to fall in some set $\mathscr{A}$ then it should enter into the likelihood function through the probability term:
$$\mathbb{P}(t_i \in \mathscr{A}|\theta) = \int \limits_\mathscr{A} f(t|\theta) \ dt.$$
Best Answer
Question 1. The "thing" you're trying to overcome is potential bias in the estimates. With your example of yearly tests probably only going over a few years, that could well be a problem. The review by Leung et al. (Annu. Rev. Public Health 1997, 18:83–104) has an example (Figure 5) of a similar situation in which a Kaplan-Meier estimate based on assuming that the event occurred at the time of the scheduled clinic visit showed substantial bias versus the Turnbull estimate.
In fairness, even with a survival time in days there is still some interval of uncertainty. The question is whether the interval is a large enough fraction of the times in question to lead to problems.
Question 2. Parametric methods work well, as there are defined contributions of interval-censored observations to the likelihood used for fitting the models. The question is whether you have chosen the correct parametric model. Turnbull's non-parametric maximum likelihood estimator is still used for non-parametric models.
For an informed introduction to issues in handling interval censoring, I'd recommend that you examine the vignette for the R
icenReg
package. Semi-parametric models with interval censoring require estimation of the baseline hazard as part of the modeling, unlike Cox models with only right censoring. Methods for semi-parametric proportional-hazard and proportional-odds models with interval-censored data are available, but they are more computationally intensive than fully parametric modeling.Question 3. For continuous-time survival $S(t)$, $S(t)=\exp (-H(t))$, where $H(t)=\int_0^t h(\tau) d\tau$ is the cumulative hazard for the instantaneous hazard $h(t)$. There really is no problem interconverting. I don't think there's anything specific to interval censoring here. A parametric model gives a continuous function $S(t)$, but that won't be valid if you chose an incorrect parametric form. For semi-parametric models it's not clear that you can get confidence intervals for the baseline survival; the icenReg vignette says (page 7): "For the semi-parametric models...to our knowledge, even using the bootstrap error estimates for the baseline distribution is not valid." But that's a problem for all of the baselines (survival, hazard, cumulative hazard). If you can get 1 you can get them all.
Question 4. Covariates are included in semi-parametric and parametric models in the same functional forms as they are for analysis without interval censoring. For non-parametric approaches, you can draw separate non-parametric Turnbull survival curves for each of several groups defined by covariate values, similar to Kaplan-Meier curves. The
interval
package provides ways to test differences between interval-censored survival curves, described in the package vignette.