Solved – What to do when ANOVA is significant, but regressors and VIF are not

correlationregressionspssvariance-inflation-factor

My question is a bit long, with 2 major parts. Here are the variables:

Number of cells (C): main dependent variable
Disease severity 1 (D1): continuous
Disease severity 2 (D2): continuous but only quantifiable on diseased organ
Age Sex
Organ side: L or R
Lateralization (L or R = 1, L and R = 2)
Location of disease in organ
Concurrent Disease 1, Concurrent Disease 2
N=208

We are trying to reproduce a previously published paper that found a significant association between C and D1. The disease can be present in L, R, or both. Age is a confounding factor because C normally decreases with age. Both L and R organs are entered in the database as their own lines if both organs are affected, and only one line if only L or R is affected. Each line contains the data of both organs and have the Lateralization variable. We followed the previously published statistics and found directly conflicting evidence, and we want to show that D1 and D2 are not related to C.

ANALYSIS A.

The analysis we reproduced is as follows:

In entire cohort of both Lateralization 1 and 2: With age as a covariate, partial correlation between: C & D1, C & D2.

In cohort of only Lateralization 1:
1. With age as a covariate, partial correlation between: C & D1, C & D2, and difference in C between diseased and nondiseased organ vs difference in D1 between diseased and nondiseased organ
2. Paired t-test to compare C in diseased organ vs. non-diseased organ
3. In patients with D1 < 2 (arbitrary cutoff by previous authors): Pearson correlation between C, D1, D2, age 4.
In all Lateralization 1 patients only, subdivided into groups of D1 <2 and D1 ≥2 D: Mann Whitney U tests for age, D1, D2, C

All aforementioned steps repeated for Patients without disease 1, and without disease 2 separately (not looking for interaction between these diseases)

ANALYSIS B.

However, I thought I could also do 2 hierarchical multiple regressions, both with C as the dependent variable. The blocks would unfold as follows:

Age, Sex,
Lateralization location of disease in organ,
Concurrent Diseases 1 and 2,
D1

and

Age, Sex,
Lateralization location of disease in organ,
Concurrent Diseases 1 and 2,
D2

ANALYSIS C.

I did a partial correlation and put all variables from block 1-3 from the regressions with my IVs as D1 and D2, with dependent variable C. I read somewhere that a partial correlation is only good for 3 covariates?

Which is a better analysis to report?

If the multiple regressions are better to report, I have an issue with my results. My regressors are nonsgnificant, which is what we want to confirm.
But, ANOVAs for each model are p < 0.01. I ran VIF and all of my variables have VIF < 1.5.

EDIT:

here is my output

EDIT:

changed order of predictors

Best Answer

From your output, it seems that you might be placing too much importance on a result that didn't pass an arbitrary statistical significance cutoff yet might still be consistent with the previously reported results.

Note the extremely wide 95% confidence limits for the D1 coefficient in your regression: from -9.7 to +68. Yes, the p-value of 0.14 could be interpreted to mean that D1 is not statistically significant in this data set. But is the coefficient reported previously by others within the confidence limits that you found? Is your point estimate of the coefficient within the confidence limits that were previously reported? If so, then your data do not really refute the prior result that D1 is related to C. Perhaps your sample was simply too small to document that (possibly weak) relationship reliably.

Edit after seeing the prior paper and additional results:

The prior paper seems not to have done a very thorough job of controlling for covariates, instead performing a set of individual correlations. There is little question that ANOVA or multiple regression, as you have performed, is a better approach. Note that you can't "show that D1 and D2 are not related to C" in this way, but the confidence limits on their coefficients in multiple regression will document the issue at hand.

You have to be careful in what you mean by saying your "ANOVA is significant," as your tables show two different types of results.

The Model Summary tables represent the differences between models as predictors are added sequentially. So the p-values indicate whether each additional model reduces variance significantly from its predecessor. In each case where the model is augmented by adding the D1 (Severity measure) predictor, the corresponding p-value is about 0.13 or 0.14, as it is for D1 in the multiple regression. Not statistically significant with this data set.

The significant ANOVA results that trouble you seem to be those presented in the tables labeled "ANOVA." Yes, the models that contain D1 are significant. But these are tests of a model with all of the specified variables against a model with no variables, as the associated degrees of freedom indicate. That just tells you that the combination of included predictors is significantly better than nothing, not that any individual predictor is "significant." It seems that all models significant in the tables labeled "ANOVA" include age, a significant predictor in the multiple regression. Adding D1 wasn't able to reduce the model to insignificance, but that doesn't necessarily mean that D1 itself is "significant."

I still caution, however, that your work does not rule out a weak possible contribution of D1 to cell count. You certainly have, however, shown that age is an important predictor.

ANALYSIS A.

ANALYSIS B.

ANALYSIS C.

Best Answer

Related Solutions

Correlation vs. VIF in OLSR Model: Understanding Multicollinearity Decision Factors

Related Question