Schoenfeld residuals for factors with cox.zph

cox-modelproportional-hazardsrsurvival

The cox.zph function of the survival package computes the Schoenfeld residuals for each variable and tests the porportionality assumption with a score test, but I don't understand how it deals with factors with more than 2 levels since when the option terms is set to TRUE (by default) it computes a set of residuals for each factor, not for each dummy variable corresponding to the factor. I was wondering how those residuals are computed in this case.

The same questions apply to continuous variables with polynomials or splines terms.

I didn't find an explanation of this in the package vignettes.

An example with splines terms:

data(pbc)
fit <- coxph(Surv(time,status==1)~ns(age,df=4),data=pbc)
prop <- cox.zph(fit,terms=F)
prop

With terms = FALSE, Cox.zph computes 4 sets of residuals, one for each spline term.

prop$y

    ns(age, df = 3)1 ns(age, df = 3)2 ns(age, df = 3)3
533         9.758356        80.023447       146.527736
617        -9.315727        16.220166        15.753548
732        -9.120447        16.132956        14.925629
737        -9.797900        15.518557        17.384126
837        -9.039351        16.062516        14.563635
877        -7.277408        -9.441608        -5.152168

However when terms is set to TRUE, it computes only one set of residuals as if there were only one coefficient.

prop <- cox.zph(fit,terms=TRUE)

head(prop$y)

    ns(age, df = 3)
533       -5.078828
617        1.817389
732        1.780094
737        1.928395
837        1.765340
877        1.949620

plot(prop)

enter image description here

In this case how is this unique set of residuals computed?

Best Answer

First, it's important to distinguish between the Schoenfeld residuals themselves and the score tests for trends of residuals over time that are used (starting with version 3 of the survival package) to evaluate deviations from proportional hazards (PH). Second, with respect to those score tests, it might be simpler to think about the tests on individual coefficients or on groups of coefficients (multiple-level categorical predictors, spline coefficients, etc) as special cases of the global test that's effectively done "under the hood" by cox.zph whether you ask for it or not. Third, the multiple residuals associated with a single predictor in the model can be combined with their associated regression-coefficient estimates to obtain the corresponding net residual in the linear predictor of the model.

Schoenfeld residuals

This page explains that

The Schoenfeld residuals are calculated for all covariates for each individual experiencing an event at a given time. Those are the differences between that individual's covariate values at the event time and the corresponding risk-weighted average of covariate values among all those then at risk.

If there are multiple coefficients associated with a predictor in a model, as in the situations you describe, there are multiple "covariates" in the above sense associated with that predictor, each with its own estimated regression coefficient.

The residuals are scaled inversely with respect to their covariances to get scaled Schoenfeld residuals $s^*_{k,j}$ for covariate $j$ at event time $t_k$. Grambsch and Therneau showed that the expected value of that residual is the difference between the time-fixed Cox-model coefficient estimate $\hat\beta_j$ and a potentially time-varying coefficient value at that event time:

$$E(s_{k,j}^*) + \hat \beta_j \approx \beta_j(t_k).$$

If PH holds, $\beta_j$ is constant over time. In that case, there should be no trend of those residuals over time.

Score tests for trends

Section 6.2 of the Therneau and Grambsch text explains how to test whether there is evidence of time trends in coefficients that would be inconsistent with PH. First, define some transformation of time $g(t)$; that's the transform argument to cox.zph(). Then, for covariate $j$, evaluate the following regression of the residual-based $\beta_j(t)$ against that function of time:

$$\beta_j(t) = \beta_j + \theta_j(g(t)-\bar g_j) $$

where $\bar g_j $ is the mean of the transformed event times. Then, for the set of $p$ covariates:

The null hypothesis of proportional hazards corresponds to $\theta_j \equiv 0, j = 1, ... ,p.$

As Therneau and Grambsch explain, this isn't done with a regression model per se but rather with a set of calculations that leads to a multi-parameter score test of that joint hypothesis. The test statistic, evaluated against $\chi^2_p$, is based on the $p \times 1$ score vector $U$ and the $p \times p$ Fisher information matrix $I$ evaluated at the above null hypothesis: the quadratic form $U^T I^{-1} U$. That's the global test.

Once you have $U$ and $I$ you can perform tests on subsets of covariates or on individual covariates, by restricting the calculation of the test statistic to the corresponding elements of $U$ and $I$ and evaluating against $\chi^2$ with degrees of freedom equal to the number of coefficients involved. In that sense, the test reported for a multi-coefficient predictor is a special case of the global test based on the subset of coefficients associated with it, and that for an individual coefficient for such a predictor is a special case of the test on the subset.

Combined scaled Schoenfeld residuals

The (unscaled) Schoenfeld residuals are first calculated for each individual event time and covariate. When terms=TRUE, the set of residuals for each event time for a multi-coefficient predictor is then combined as the inner product of the corresponding residuals and Cox model regression coefficients. That puts together all the residuals associated with that predictor, for that individual at that event time, into their net effect on the overall linear predictor of the Cox model. I've annotated below the critical code, which you can see by typing cox.zph at the command prompt:

## unscaled Schoenfeld residuals for all covariates from Cox fit
## event times in rows, covariates in columns
sresid <- resid$schoen 
## if any predictor, indexed by nterm, has >1 covariate   
if (terms && any(sapply(asgn, length) > 1)) { 
    temp <- matrix(0, ncol(sresid), nterm)
    for (i in 1:nterm) {
        j <- asgn[[i]]
        if (length(j) == 1)  ## single covariate
            temp[j, i] <- 1
        else temp[j, i] <- fit$coefficients[j] ## multiple covariates
    }
    sresid <- sresid %*% temp  ## the inner product
    ## some lines omitted
} 

That's done before additional calculations including scaling the residuals, so you can't readily reproduce that by a doing a similar calculation on the reported scaled residuals.