Estimating the baseline hazard (covariate=0) would be the same as estimating the Kaplan-Meier curve for the covariate = 0 group, would it not?
No, and I think that this is the source of your general confusion.
With a Cox proportional hazards model, you implicitly assume that there is a single baseline cumulative hazard $H_0(t)$ that applies to all cases, with the covariates $z$ and their regression coefficients $\beta$ leading to a covariate-specific hazard $H(t;z) = H_0(t)e^{\beta z}$. As explained for example on this page, the Breslow-Aalen estimate of the baseline hazard is made only after you have fit the Cox model and gotten the regression coefficients.
Thus a Cox model forces all cases to have hazards over time of the same general shape, with their steepness determined by the covariate values and the associated hazard ratios.
So even with a binary predictor in a Cox model, the baseline hazard will not coincide with the corresponding non-parametric estimator* for the covariate = 0
group. It will be some sort of compromise in the shapes of the curves, and it will show steps down at all event times, not just at the event times of the covariate = 0
group.
*Note that the Breslow and Kaplan-Meier non-parametric survival-cure estimates are determined differently. See a survival analysis text like Therneau and Grambsch, or this web page.
Best Answer
Thes questions are perhaps best answered in reverse order.
Question 4. Life expectancy (LE) is typically defined as a mean survival over a particular cohort. What you propose is the median survival. That's a perfectly good measure for many purposes, but it's not LE.
Question 3. Cox models often start at some time well after birth and evaluate survival subsequent to some defined
time = 0
of study entry. You have no information about anyone who might have died prior to the opportunity to enter the study. Prior survival information is more or less irrelevant, except that anyone entering the study has necessarily survived up until that time and the age at study entry might be a useful covariate.Question 2. Cox models don't provide survival estimates beyond the last observed event time. Unless the last observation time was a death (not censoring) in an individual before 11 years in your example, you won't have survival information that goes beyond that time. With LE defined as average age at death, you can't get that in such a situation. You can calculate a "restricted mean survival," the average survival up to some defined time end point covered by the data, but I have a lot of trouble thinking about what that really means. If 50% survival has been reached in a Cox model for all groups of interest you can calculate and compare median survivals. Standard statistical software provides ways to extract median survival, if it's been reached, from a Cox model. But that's not LE, as explained above.
Question 1. Life expectancy as an average survival time thus can be hard or impossible to evaluate from Cox models. Other measures of survival, like medians or other quantiles, are more generally applicable.