I would greatly appreciate if you could let me know how to choose among different parametric distributions including gama, Weibull, lognormal, loglogistic and etc for panel (time series cross sectional data) survival analysis or discrete time survival analysis in STATA 14.
I read these materials but they are about continuous time survival analysis:
http://spia.uga.edu/faculty_pages/rbakker/pols8501/OxfordTwoNotes.pdf
Then, I tried to calculate LR test, which is explained on page 22 of the second note, in order to calculate p_value. However, I am not sure or I don't know what to do.
Survival Distribution AIC BIC Log-Likelihood df
Exponential-Proportional Hazard: 433.663 471.1031 -209.83151 7
Exponential-Accelerated Failure Time: 433.663 471.1031 -209.83151 7
Lognormal-Proportional Hazard: 377.6502 420.4389 -180.82508 8
Loglogistic-Proportional Hazard: 377.874 420.6627 -180.93701 8
Gama-Proportional Hazard: cannot compute an improvement -- discontinuous region encountered
Weibull-Proportional Hazard: cannot compute an improvement -- discontinuous region encountered
Weibull-Accelerated Failure Time: 205.8869 248.6756 -94.943472 8
Besides, I just could test PH assumption for cox model, which is not a kind of panel data.
What's more, I couldn't do what is instructed on pages 24 and 25 of Oxford second note. In fact, when I use the "predict" command, it gives me an array of continuous values even though my dependent variable is discrete.
My data set is as follows: ID represents different companies in my sample. Event shows that if the company went bankrupt or not. X1 to X5 are my independent variables.
ID TIME EVENT x1 x2 x3 x4 x5
1 1 0 1.28 0.02 0.87 1.22 0.06
1 2 0 1.27 0.01 0.82 1.00 -0.01
1 3 0 1.05 -0.06 0.92 0.73 0.02
1 4 0 1.11 -0.02 0.86 0.81 0.08
1 5 1 1.22 -0.06 0.89 0.48 0.01
2 1 0 1.06 0.11 0.81 0.84 0.20
2 2 0 1.06 0.08 0.88 0.69 0.14
2 3 0 0.97 0.08 0.91 0.81 0.17
2 4 0 1.06 0.13 0.82 0.88 0.23
2 5 0 1.12 0.15 0.76 1.08 0.28
2 6 0 1.60 0.26 0.55 1.31 0.37
2 7 0 1.58 0.26 0.56 1.16 0.35
2 8 0 1.54 0.24 0.59 1.08 0.33
2 9 0 1.72 0.22 0.55 0.84 0.29
2 10 0 1.72 0.21 0.53 0.79 0.29
2 11 0 1.63 0.19 0.55 0.73 0.27
2 12 0 2.17 0.32 0.44 0.95 0.43
3 1 0 0.87 -0.03 0.79 0.61 0.00
3 2 1 0.83 -0.14 0.95 0.57 -0.02
Best regards,
Best Answer
First, with at most one bankruptcy event per company, you don't have data for a panel survival model as described on the page linked in your question. Quoting from that page:
For example, if you were studying how quickly different people caught a common cold, which can happen often, you might include as a random effect the different tendencies for individuals to catch a cold, given the covariate values. But with only one event per individual you can't do that. What you have are data arranged in a standard format for survival analysis with time-dependent covariates. From the structure of your data you don't seem to have any random effects.
Second, unless you have a really strong reason to suspect that your survival data take a particular parametric form, you will typically be better off trying the semi-parametric approach of Cox proportional hazards regression rather than a parametric model. With Cox regression you don't need to know how the underlying hazard changes over time, removing an important assumption that you would have to make with a parametric model. Yes, you only have data for one time point per year, but that's no more incompatible with a Cox model than it would be with parametric models.
Third, you do have to be careful in the organization of the data for your model. The covariate values for any time point should represent their status just before the event. So it would be wrong to use year-end covariate values on the same row as bankruptcy events that could have occurred earlier in the year. Depending on the nature of your data you might need to reorganize the data so that covariates best represent predictors of events at the noted times.
Fourth, and perhaps most important, you might want to consider whether and how a survival model is appropriate for your study. For example, the Cox model assumes a basic shared shape of the hazard as a function of time starting from time 0. In clinical studies, for example, time 0 might be the time a patient received a particular treatment. In your case, the assumptions and implications of the assumption of a shared hazard-function shape could be quite different in 2 different scenarios: (a) if time 0 represents the time of formation of each company, or (b) time 0 represents, say, the calendar year 2000. You need to think carefully about how such assumptions correspond to what you know about the subject matter.