Solved – Standardising input parameters in coxph models

cox-modelrsurvival

I'm trying to standardize the input variables of my coxph analysis to make outputs more easily interpretable.

I've used the following function (Which was suggested to a not so similar problem in stackoverflow which I now can't find) to standardize input parameters using Gelman's suggestion of scaling by 2SD:

z.mean.sd <- function(data){ 
  return.values <- (data  - mean(data, na.rm = TRUE)) / (2*sd(data, na.rm   
                                                            = TRUE)) 
  return(return.values) 
} 

Which gives me far too much of an effect for Inbreeding and Sire Age at conception…

               coef  exp(coef)   se(coef)      z Pr(>|z|)   
CDamAge  -0.0004851  0.9995150  0.3272857 -0.001  0.99882   
CSireAge  0.9301754  2.5349537  0.3227302  2.882  0.00395 **
CDPF      0.2311019  1.2599877  0.2845579  0.812  0.41671   
CSPF     -0.1390246  0.8702066  0.2578415 -0.539  0.58976   
CFI       0.8240312  2.2796712  0.3249533  2.536  0.01122 * 

I've also used scale which used 1SD and gives us a better output, but it's still not 100%.

                 coef  exp(coef)   se(coef)      z Pr(>|z|)   
stzDamAge  -0.0002426  0.9997575  0.1636428 -0.001  0.99882   
stzSireAge  0.4650877  1.5921538  0.1613651  2.882  0.00395 **
stzDPF      0.1155510  1.1224917  0.1422789  0.812  0.41671   
stzSPF     -0.0695123  0.9328486  0.1289208 -0.539  0.58976   
stzFI       0.4120156  1.5098580  0.1624767  2.536  0.01122 *  

We would not expect inbreeding to have that much of an effect (we would expect a exp.coef. something more like 1.02)

My question is: could this output potentially be a product of poor data, or am I missing some fundamental step here? Is there an issue with the function I've written?(this is maybe more of a StackOverflow Q). Should I even be trying to standardise my input parameters when I use coxph – I've found no supporting information to doing so…

I'm a both a stats and site newb so apologies if this is in the wrong place-it seemed like the more likely of the two out of this and StackOverflow.

Thanks!

Best Answer

You expect the HR due to age of the sire to be about 1.02 per year of age. Note that once you standardized the ages they were no longer expressed in years, but rather in multiples of the SD of the ages. So it's not surprising that the coefficient values from the standardized data don't agree with your expectation. Note that the coefficients after scaling by 1 SD are simply half of those after scaling by 2 SD, and all p values are identical within your two presented analyses.

For this type of Cox analysis there is no need to standardize the ages (and it just leads to confusion, as you found). Standardization of predictor variables is important for some purposes (such as when you need to consider relative importance of different predictors in ridge regression or lasso), but even then it is best to present final results for variables like age re-scaled to their usual units, like years.