Statistical Analysis – How to Calculate 95% CI Around Percentage Attenuation Using Bootstrap Method

bootstrapcox-modelrstata

I am trying to repeat the statistical methods of an article in which Cox regressions were used to compute hazard ratios (HRs) for the association between life-course SES and death. The author created a variable for percent attenuation in the b coefficient, which was for SES after inclusion of the risk factor in question to Model 1 (age and sex): 100*(b(Model 1) – b(Model 1 + risk factor))/(b (Model 1)).

Stringhini, S. and Zaninotto P. Socio-economic trajectories and cardiovascular disease mortality in older people: the English Longitudinal Study of Ageing. Int J Epidemiol. PMID: 29040623; PMCID: PMC5837467

Now I want to complete it using Stata (best if possible) or R. I searched the methods of calculating 95% CI using bootstrap. Here is an example:

The problem is that the comparative regression coefficients were in different models, unlike the example above. I wonder if the methods in the article could be applied using Stata or R.

Best Answer

First, you need to write a function that calculates the "percent attenuation" for a data set and a "Risk Factor." It would: (a) fit both "Model 1" and "Model 1 + Risk Factor" to the data set; (b) extract the $\beta_{SES}$ regression coefficient for SES from both models; (c) calculate the "percent attenuation" following your equation.

Then you take multiple bootstrap samples of the original data set, apply that function to each of them, and keep track of all the corresponding "percent attenuation" values.

Finally, for a simple estimate of the 95% CI, put the "percent attenuation" values in order, and select the 2.5th and 97.5th percentiles. For example, if you did 999 bootstrap samples you could take the 25th and 975th from the ordered list.

This is straightforward in R, using tools in the boot package. The boot.ci() function in that package provides for more refined CI estimates that might be helpful if there is bias or skew in your "percent attenuation" measure. I don't use Stata. (Software-specific programming details are off-topic on this site anyway unless there is associated statistical content.)

There is at least one detail with respect to the cited paper that you might need to consider. The paper used multiple imputation to deal with some types of missing variable values. If that's the case, then the $\beta$ values should not have been from a single regression but should have been averages among separate fits on each of the imputed data sets. See Stef van Buuren's multiple imputation book. If you want to repeat what was done in that paper (and the paper handled the multiple imputation correctly), then your function to calculate "percent attenuation" should do the same.

A couple of warnings

First, that measure of "percent attenuation" is not additive across multiple "Risk Factors." You can see that in Tables 3 and 4 of the paper. The attenuation when all risk factors are combined is less than the sum of the attenuations associated with each of the individual risk factors. That's to be expected when risk factors are correlated among each other and with SES. So be very careful when interpreting those values.

Second, it is hard to draw causal inferences from a study like this. If some of the "risk factors" are themselves caused by SES and then mediate the effect of SES on mortality, then the "attenuation" estimate derived by adding them to the model might be an "overadjustment for mediators" as Hernán and Robins explain in Section 18.2 of What If?. If SES is driving those risk factors, those "attenuation" values don't necessarily mean that SES is correspondingly less important. They would then be related to how much of the SES effect the risk factors mediate.

Related Question