Logistic Regression – How to Determine the Association Between Covariates and Treatment Group

association-measurelogistic

I want to determine if there is an association between my covariates (age, gender, bmi) and whether the person is in the treatment group or not. I ran logistic regression using each covariate in term as the one explanatory variable. I.e. y=b0 +b1age and then I used cross validation to measure how accurate age was in predicting whether the person was in the treatment group. Should I include covariates that were able to predict more than 50% of the time whether the person was in the treatment group?

Best Answer

Without any further detail, what you are doing is re-inventing the issue of p-values in table 1. Turkiewicz et al. say the following here

Similarly, a P-value >0.05 can never be used to support a statement that the null hypothesis is true (often expressed as “there was no difference …”) because absence of evidence is not evidence of absence. It is important to recognize that the P-value is a measure for inferential purposes, not descriptive ones. Thus, P-values in ‘Table 1’ (which usually describes the study sample) are useless.

Predictiveness in a general sense is almost identical to statistical significance testing - a claim that's too short to show in this answer - and testing for the statistical significance of a covariate in a bivariate logistic regression model amounts to running any type of categorical analysis, like a Pearson Chi-square test of independence.

The problem with using cross-validation to assess this is that you still have not controlled for multiple comparisons. Three covariates of age, sex, and BMI ``tested'' against a randomization assignment for balance will have a family-wise false positive error rate of $1-(1-0.05)^3 \approx 15\%$. So you would, at a minimum need to apply a correction, such as Bonferroni, which leaves the reader to ask how much power you actually have to detect imbalance with a method such as this?

Dr. Stephen Senn nicely summarizes the issue of ``obsessing with balance'' here and here. To summarize: any attention given to inspecting the relation of a covariate to the randomization assignment is a complete lost cause. However, covariates which are known to be strong prognostic factors should be adjusted regardless of whether they're balanced or not, provided the study has sufficient power to do so.

Related Question