Solved – R: CoxPH model with a categorical variable that has too many levels

categorical datacox-modelmany-categoriessurvival

I have dataset df consist of 8000 observations

org_id property1 property2  property3 uptimeDay event

and org_id is a categorical variable with 1199 different levels. The other two variables or properties of an organization and are numerical.

coxp_1<-coxph(formula = Surv(uptimeDay, event,type='right') ~ (peroperty1 + property3)^2 + property2 +  I(as.factor(org_id)), data = df_cox)

I am planning to run the following cox model in R but I keep getting this error msg which I am guessing is caused due to the fact that my categorical variable (org_id) has to many different levels.

Error in fitter(X, Y, strats, offset, init, control, weights = weights,  : 
  NA/NaN/Inf in foreign function call (arg 6)

Does anybody know what could be a potential solution for this problem?

Best Answer

The Cox Proportional Hazards' Model needs your event variable to have at least one event and one non-event (event = 0) for each level of the categorical variable. Otherwise, it's called Perfect Classification. To check this see the results of: xtabs(~event + org_id, data = df_cox)

My guess is since your dataset has 8000 observations and 1199 different level, a solution would be to increase the number of observations or club different levels together.

Related Question