Probability – Survival Analysis Using Linear Probability Models and Panel Data

conditional probabilitylinear modelpanel dataprobabilitysurvival

This question might have been answered somewhere else but I could not find it.

Hi all.
My research is about investigating whether a certain policy increases the speed of construction of housing units or has no effect. Since market factors play role in duration, I need to structure it in panel format and I am in desperate need for panel data tools (Fixed effects, clustering etc.)

I was thinking to use Probability Linear Models (PLM) for estimation that its results are much easier to explain as well as the ability to use panel data settings. Also there are new methods to produce better estimates which you can find here.

However, I could not find a source that explains how I can estimate a survival model using PLM or Least Square methods. All I could find was Additive Hazard models, which are a bit problematic because of time-variant covariates and the difficulty in reporting results as well as statistical inference.

I want to know do you have any suggestions or do you any sources that could help me with this settings?

In short, how can I estimate a survival model in panel format using linear regression.

Thank you all for your time!

Best Answer

A discrete-time survival model suitable for panel data with time-varying covariates is essentially a set of binomial regressions for the included time periods. See Willett and Singer, for example. So if you really want to use a linear probability model for each of those binomial regressions there's nothing to stop you, as @AndyW implies in a comment.

The reason why you aren't finding pre-built software to do linear probability model fitting in this context is that standard logistic regression or other binomial modeling that restricts probabilities to [0,1] (e.g., probit regression, complementary log-log link) is superior for the binomial modeling. It's hard to imagine a situation in which a linear probability model would be superior, particularly for "statistical inference." You claim that its "results are much easier to explain," but how do you explain predictions of negative probabilities from a linear probability model?

Particularly if you are intending to publish your results, stick with established statistical approaches.