Solved – Two significant covariates in coxph analysis: exploring whether the two covariates interact significantly

assumptionscox-modelinteractionlocal-statistics

I am using R 2.15.2 GUI1.53 64-bit on MacOSX 10.5.8 to perform these analyses. I am a molecular biologist by training, not a statistician, so all this analysis represents my own bootstrapped learning and I may have lots of follow-up questions.

I am performing a cox proportional hazards survival analysis on a large dataset of binomial type 1 right censored data. These data come from controlled exposures fish to different doses of virus at time zero, followed by a set time of observation to monitor mortality. All experiments revealed a strong dose-response in mortality, so I am using virus dose as a stratification term in the coxph analysis. I am interested in determining if two covariates significantly contribute to the proportional hazard of mortality: fish size(age) at time of exposure and the virus type used at exposure. A priori, I have good reason to think that there IS a significant effect from virus type, based on field observations of epidemic disease. The question of whether fish age (proxy measure being size) is a significant predictor of mortality is much less clear from field observations, but my hypothesis is that younger fish are more susceptible than older fish.

when I analyse the effect of each covariate independently via coxph, I see that both have some level of significant effect:

for virus type, which is six different viruses compared to sham groups:

    coxph(formula = surv.ALL ~ survivals$virus.type + strata(survivals$virus.dose > 
2000))
                           coef exp(coef) se(coef)      z Pr(>|z|)    
    survivals$virus.type002  1.5724    4.8184   0.3415  4.605 4.13e-06 ***
    survivals$virus.type007  2.7217   15.2067   0.3016  9.024  < 2e-16 ***
    survivals$virus.type009  3.1833   24.1272   0.3046 10.453  < 2e-16 ***
    survivals$virus.type110  3.5779   35.7998   0.2913 12.281  < 2e-16 ***
    survivals$virus.type111  3.2039   24.6284   0.2981 10.747  < 2e-16 ***
    survivals$virus.type139  2.8502   17.2911   0.3013  9.460  < 2e-16 ***

AND for fish size, which is a factor with levels = sm, m, lg:

coxph(formula = surv.ALL ~ survivals$fish.size + strata(survivals$virus.dose < 
2000))

                         coef exp(coef) se(coef)     z Pr(>|z|)    
    survivals$fish.sizem  0.02582   1.02615  0.08565 0.301 0.763098    
    survivals$fish.sizesm 0.33225   1.39411  0.08914 3.727 0.000194 ***

The effects of the virus type fit my a priori assumptions in terms of which type is most virulent (e.g. 110>111>002). My a priori hypothesis about fish size was that the smallest would suffer the greatest proportional hazard, and that appears to be true. (please correct me if these interpretations are incorrect?)

Now I would like to determine if fish size and virus stock interact in their impact on the proportional hazard of mortality. The output in R has me a bit baffled in that each factor of fish size and virus type are listed singly (where their values differ from the analyses above) and then listed in combination (please note that not all size:type combinations were tested, hence the NA values).

coxph(formula = surv.ALL ~ survivals$virus.type * survivals$fish.size + 
strata(survivals$virus.dose < 2000))

                                                 coef exp(coef)
    survivals$virus.type002                        1.7585    5.8038
    survivals$virus.type007                        1.9316    6.9004
    survivals$virus.type009                        3.3127   27.4600
        survivals$virus.type110                        2.9438   18.9882
    survivals$virus.type111                        2.4122   11.1587
        survivals$virus.type139                        2.3408   10.3895
    survivals$fish.sizem                          -0.6003    0.5486
        survivals$fish.sizesm                          0.6900    1.9938
    survivals$virus.type002:survivals$fish.sizem       NA        NA
    survivals$virus.type007:survivals$fish.sizem   0.7983    2.2218
    survivals$virus.type009:survivals$fish.sizem       NA        NA
    survivals$virus.type110:survivals$fish.sizem   0.9628    2.6189
    survivals$virus.type111:survivals$fish.sizem   0.8156    2.2606
    survivals$virus.type139:survivals$fish.sizem   0.2687    1.3083
    survivals$virus.type002:survivals$fish.sizesm      NA        NA
    survivals$virus.type007:survivals$fish.sizesm      NA        NA
    survivals$virus.type009:survivals$fish.sizesm      NA        NA
    survivals$virus.type110:survivals$fish.sizesm -0.3135    0.7309
    survivals$virus.type111:survivals$fish.sizesm      NA        NA
    survivals$virus.type139:survivals$fish.sizesm      NA        NA
                                          se(coef)      z Pr(>|z|)
    survivals$virus.type002                         0.5363  3.279  0.00104
    survivals$virus.type007                         0.4288  4.505 6.65e-06
    survivals$virus.type009                         0.5137  6.448 1.13e-10
    survivals$virus.type110                         0.7158  4.112 3.91e-05
    survivals$virus.type111                         0.4241  5.687 1.29e-08
    survivals$virus.type139                         0.4244  5.515 3.49e-08
    survivals$fish.sizem                            0.8662 -0.693  0.48824
    survivals$fish.sizesm                           0.8165  0.845  0.39807
    survivals$virus.type002:survivals$fish.sizem    0.0000     NA       NA
survivals$virus.type007:survivals$fish.sizem    0.6661  1.199  0.23072
    survivals$virus.type009:survivals$fish.sizem    0.0000     NA       NA
survivals$virus.type110:survivals$fish.sizem    0.8706  1.106  0.26881
    survivals$virus.type111:survivals$fish.sizem    0.6598  1.236  0.21637
survivals$virus.type139:survivals$fish.sizem    0.6688  0.402  0.68783
    survivals$virus.type002:survivals$fish.sizesm   0.0000     NA       NA
survivals$virus.type007:survivals$fish.sizesm   0.0000     NA       NA
    survivals$virus.type009:survivals$fish.sizesm   0.0000     NA       NA
survivals$virus.type110:survivals$fish.sizesm   0.8221 -0.381  0.70296
    survivals$virus.type111:survivals$fish.sizesm   0.0000     NA       NA
survivals$virus.type139:survivals$fish.sizesm   0.0000     NA       NA

    survivals$virus.type002                       ** 
    survivals$virus.type007                       ***
    survivals$virus.type009                       ***
    survivals$virus.type110                       ***
    survivals$virus.type111                       ***
    survivals$virus.type139                       ***
    survivals$fish.sizem                             
    survivals$fish.sizesm                            
    survivals$virus.type002:survivals$fish.sizem     
survivals$virus.type007:survivals$fish.sizem     
    survivals$virus.type009:survivals$fish.sizem     
survivals$virus.type110:survivals$fish.sizem     
    survivals$virus.type111:survivals$fish.sizem     
survivals$virus.type139:survivals$fish.sizem     
    survivals$virus.type002:survivals$fish.sizesm    
survivals$virus.type007:survivals$fish.sizesm    
    survivals$virus.type009:survivals$fish.sizesm    
survivals$virus.type110:survivals$fish.sizesm    
    survivals$virus.type111:survivals$fish.sizesm    
survivals$virus.type139:survivals$fish.sizesm    

so why are the single covariate lines returning different values compared to the respective analyses above? how do I interpret the exp(coef) values in the interaction pairs- since none have significant p values, are there no significant interactions?

beyond this question about interaction, I would like to look for signs that the underlying assumption of coxph are not being violated. Is there a specific function in the survival package for this (I really struggle to understand the R documentation language, as it assumes a much greater statistical knowledge than I have)?

also, I came across source('http://www.stat.ucla.edu/~david/teac/surv/local-coxph-test.R') as a script to perform a local test for significant explanatory variables, but it is defunct as far as I can tell. is there a more up to date function for this?

thank you for your attention!

Best Answer

Q: "how do I interpret the exp(coef) values in the interaction pairs"

A: You don't. You use predict(). And you don't separately examine the p-values, either. The separate p-values are pretty much meaningless. You use anova()-type analyses to compare nested models, ideally with some method to penalize the process so you are not susceptible to multiple comparisons artifacts.

(Note: it is possible to compare some of the coefficients. The single level "main-effects" fishsize coefficients in an interaction model would be the expected log(hazard ratios) for the other fishsizes compared with the baseline levels but only for the reference value of virus. And similarly for the "main-effects" virus coefficients.)