Statistical Significance – Disagreeing Significance Tests for an Ordinal Logistic Regression

confidence intervalordered-logitp-valuestatistical significance

I calculated both confidence interval and p-value on the same model using R, and the two tests of significance disagreed with one another.

I am comparing how 3 groups differ in terms of an ordinal variable, once a separate numeric predictor was taken into account. The model indicated that group was a significant predictor, and when I graphed the probabilities, I noticed that two of the groups were very similar, so I created a subset of the data only containing these groups to see if group was still a statistically significant predictor.

The p-value indicated that it was not (p=0.64), but the confidence interval indicated that it was a significant predictor (CI 2.5% = 0.68, CI 95% = 1.28).

I'm not sure where to go from here. I did a couple of other tests (which make a few extra assumptions and so aren't ideal), and all of them indicated non-significance. I've been using confidence intervals for this entire project, so suddenly switching to p-values is a bit jarring and I would prefer not to turn to tests & models that are less ideal.

I wish to know why this happened, what it means, and, if possible, whether this is due to a mistake in my code.

here's the code for the model & statistical tests I used:

library(MASS)
#model
DvS<-polr(ordinal~numeric+group, data=DvSata, Hess = TRUE)
summary(DvS)

#confidence intervals & odds ratios
ci<-confint(DvS)
exp(coef(DvS))
exp(cbind((OR = coef(DvS)), ci))

#p-value
st <- coef(summary(DvS))
pval <- pnorm(abs(st[, "t value"]),lower.tail = FALSE)* 2
st <- cbind(st, "p value" = round(pval,3))
st 
```

Best Answer

Recall that the null hypothesis for an odds ratio (OR) is that $\text{OR} = 1$, which occurs when $\text{log(OR)} = 0$. The confidence interval you produced is around the OR; it contains 1, and therefore is nonsignificant, consistent with the p-value. If you produce a confidence interval around the $\text{log(OR)}$ (i.e., the coefficient itself), you will see that it will contain 0.

Related Solutions

Power Analysis for Ordinal Logistic Regression

I prefer to do power analyses beyond the basics by simulation. With precanned packages, I am never quite sure what assumptions are being made.

Simulating for power is quite straight forward (and affordable) using R.

decide what you think your data should look like and how you will analyze it
write a function or set of expressions that will simulate the data for a given relationship and sample size and do the analysis (a function is preferable in that you can make the sample size and parameters into arguments to make it easier to try different values). The function or code should return the p-value or other test statistic.
use the replicate function to run the code from above a bunch of times (I usually start at about 100 times to get a feel for how long it takes and to get the right general area, then up it to 1,000 and sometimes 10,000 or 100,000 for the final values that I will use). The proportion of times that you rejected the null hypothesis is the power.
redo the above for another set of conditions.

Here is a simple example with ordinal regression:

library(rms)

tmpfun <- function(n, beta0, beta1, beta2) {
    x <- runif(n, 0, 10)
    eta1 <- beta0 + beta1*x
    eta2 <- eta1 + beta2
    p1 <- exp(eta1)/(1+exp(eta1))
    p2 <- exp(eta2)/(1+exp(eta2))
    tmp <- runif(n)
    y <- (tmp < p1) + (tmp < p2)
    fit <- lrm(y~x)
    fit$stats[5]
}

out <- replicate(1000, tmpfun(100, -1/2, 1/4, 1/4))
mean( out < 0.05 )

Confidence Interval – Calculating Interval from Significant p Value in Kruskal Wallis Test

My interpretation of this is as follows. If I am wrong, please give the kind of clarification suggested by @whuber, perhaps along with some sample data to illustrate what you are doing.

If the Kruskal-Wallis test rejects the null hypothesis that the three medians are all equal, then you will use two-sample Wilcoxon tests to do multiple comparisons A vs B, B vs C and A vs C. In order to control the overall error rate for the three comparisons you might use the Bonferroni significance level $.05/3 = .0167$ for the comparisons.

Recently, some psychology and sociology journals have blamed irreproducibility of certain results on abuse of P-values, and ask for confidence intervals (CIs) in addition to or instead of P-values. (I'm not saying they are correct to deprecate P-values or that asking for CIs always makes sense, just stating what I have observed and heard.)

You might give $(100 - 1.67)\% = 98.3\%$ CIs for the differences in medians. Presumably, these could be CIs produced by Wilcoxon test procedures. A difficulty may be that a 5-point ordinal scale might produce some ties, but perhaps the approximate CIs given in spite of that would be useful.

I doubt that the journal is asking for a CI for the overall Kruskal-Wallis test, but if so, perhaps use @jbowman's suggestion.

In the tentative exploration (in R) below I use fake simulated data for groups A, B, and C. There are $n = 50$ responses in each group, summarized as follows:

summary(A)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.190   2.212   2.935   2.817   3.280   4.700 
summary(B)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.350   2.695   3.450   3.331   4.107   4.690 
summary(C)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.620   3.328   3.800   3.672   4.228   4.690

Concatenating the data to the vector X and making a group variable gp, we have the following notched boxplot. Notches in the sides of the boxes are approximate nonparametric CIs for individual group medians, calibrated so that two non-overlapping CIs indicate a significant difference. Roughly, it seems that A and B may differ significantly, that B and C clearly do not, and that A and C are obviously significantly different.

 kruskal.test(X ~ gp)

             Kruskal-Wallis rank sum test

     data:  X by gp
     Kruskal-Wallis chi-squared = 17.887, df = 2, p-value = 0.0001306

So there is no doubt that the groups vary. Now we do three 2-sample Wilcoxon tests. Remember that we are looking for P-values below .0167 in order to declare significant differences.

wilcox.test(A, B, conf.int=T, conf.lev=.983)

    Wilcoxon rank sum test with continuity correction

data:  A and B
W = 825, p-value = 0.003428
alternative hypothesis: true location shift is not equal to 0
98.3 percent confidence interval:
 -1.0200705 -0.1199649
sample estimates:
difference in location 
            -0.5900226

wilcox.test(B, C, conf.int=T, conf.lev=.983)

        Wilcoxon rank sum test with continuity correction

data:  B and C
W = 984, p-value = 0.06719
alternative hypothesis: true location shift is not equal to 0
98.3 percent confidence interval:
 -0.72004953  0.09001254
sample estimates:
difference in location 
            -0.3000609

wilcox.test(A, C, conf.int=T, conf.lev=.983)

        Wilcoxon rank sum test with continuity correction

data:  A and C
W = 537.5, p-value = 9.175e-07
alternative hypothesis: true location shift is not equal to 0
98.3 percent confidence interval:
 -1.3099358 -0.5300047
sample estimates:
difference in location 
            -0.9199606

Summarizing, we see that A and B are significantly different according to the Bonferroni criterion [CI $(-1.02, -0.12)$]; B and C are not significantly different [CI includes 0]; A and C are highly significantly different [CI $(-1.31 -0.53)].$

Note: Data were simulated as follows:

set.seed(918); n = 50
A = 1+round(4*rbeta(n, 2, 2),2)
B = 1+round(4*rbeta(n, 3, 2),2)
C = 1+round(4*rbeta(n, 3.5, 2),2)
X = c(A,B,C);  gp=rep(1:3, each=n)

Best Answer

Related Solutions

Power Analysis for Ordinal Logistic Regression

Confidence Interval – Calculating Interval from Significant p Value in Kruskal Wallis Test

Related Question