Solved – How to calculate Prob > chi2 in R to test model fit of conditional logistic regression

chi-squared-testlogisticp-valuersurvival

I used the clogit function (from the survival package) to run a conditional logistic regression in R with a big dataset of 1:M matched pairs with n=300368964 and number of events= 39995.

model <- clogit(Alliance ~ OVB + CVC + BVB + strata(Strata), method="exact")    

I received following results:

                 coef  exp(coef)   se(coef)       z Pr(>|z|)    
OVB        -0.0498174  0.9514031  0.0166275  -2.996  0.00273 ** 
BVB         0.0277405  1.0281289  0.0304956   0.910  0.36300    
CVC         1.1709851  3.2251683  0.1089709  10.746  < 2e-16 ***
EarlyStage -1.3215824  0.2667129  0.0205851 -64.201  < 2e-16 ***
AvgVCSize   0.0087976  1.0088364  0.0002035  43.224  < 2e-16 ***
NumberVC    0.0643579  1.0664740  0.0034502  18.653  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Rsquare= 0   (max possible= 0.001 )
Likelihood ratio test= 6511  on 6 df,   p=0
Wald test            = 6471  on 6 df,   p=0
Score (logrank) test = 6801  on 6 df,   p=0

Since Rsquare equals 0 and the test ratios seems very high, I tried to plot the results to check whether the model fits. But I wasn't able to plot it properly.

I would online many papers which use the ratio Prob > chi2 = 0 from Stata as test ratio to proof the model fit.

How could I calculate this ratio in R? Are there any other ways I could check the model fit of my clogit results?

I would appreciate any help.

Thanks you very much in advance.

Best Answer

In the diagram below, the parts on the right are from the document you linked, and the parts on the left are in your output in your question. I have marked corresponding parts with the same colors (the values won't be the same in this case because they're for different data sets):

enter image description here

Now, the thing in green is called the likelihood ratio test statistic. For sufficiently large sample size it has approximately a chi-square distribution.

The thing in red is called the p-value. It is not correctly defined in the Stata information you linked. It is the probability of getting a chi-square value at least as large as the one you observed if the null hypothesis were true. It is correctly defined in the first sentence here.

You decide significance by comparing the p-value with your significance level. (You haven't said what significance level you're using.)