Survival Analysis – Understanding Time-Dependent Probabilities

engineering-statisticsprobabilityreliabilitysurvival

I am working on a data-driven collision risk model for cars. Using car trajectory data, the model computes the probability of collision at each data timestamp with 2 sub-models:

An overlap model: By extrapolating car trajectory data, the model outputs the probability that the car will overlap $P_{\text{olap}}$ at some time as well as the overlap time $t_{\text{olap}}$.
An intervention model: Given the time to overlap, this model outputs the intervention probability $P_{\text{intervention}}$. The probability of non-intervention is given by:
$$
P_{\text{non-intervention}} = 1- P_{\text{intervention}}(t_{\text{olap}})
$$

The collision probability at $t_{\text{olap}}$ is the product of an overlap and a non-intervention probability. We assume both models to be independent:
$$
P_{\text{col}}(t_\text{olap}) = P_{\text{olap}} \times P_{\text{non. intervention}}(t_\text{olap})
$$

Here is a dummy discrete example, where 2 cars are on a converging path.

At t=0s, the extrapolation of the cars' paths suggests that the car will hit in 60 seconds. The probability of intervention in 60 seconds is quite high (.9) therefore the collision probability at t=60 seconds is 0.1.
At t=1s, assuming the cars continue with no prior intervention, the extrapolated paths suggest there's slightly less time for an intervention (59s remain), leading to a higher probability of collision of .3.
At t=2s, an intervention has probably started and the path extrapolation gives a lower overlap probability leading to a probability of collision of .1.

The collision probability at the time $t_{\text{olap}}$ in function of time looks as follows:

Retrieving the data from the whole trajectory shows that no collision happened. However, I am interested in determining the probability that a collision could have occurred.

Currently, I only consider the maximum probability (.3 in the dummy example) but is there a better way of calculating the probability? I have seen people using reliability theory (survival analysis) with $P(\text{col}) = 1 – \exp \left(-\int_{0}^{t} \lambda(t) \, dt\right)$ (this would give 0.36 in the dummy scenario) however this does not seems right to me since the probability is not "static".

To be clear, I am not interested in the modelling of the scenario (extrapolation of the trajectories and intervention model) but just how to use the obtained probabilities.

Best Answer

This seems like a conceptual issue.

As you have correctly identified that the standard "reliability theory"(I'd call it survival, but terminology in statistics is a mess...) formula doesn't work because the $\lambda(t)$ represents the risk for the event to happen at exactly time $t$. Your probabilities are already forward looking, i.e. $P_{t=0}(collision)$ incorporates all possible courses of events that would lead to a collision and is equivalent to the overall event probability/cumulative incidence in a survival setting. There is, by definition, no better a priori probability.

Also $P_{t=1}(collision)$ has all the information $P_{t=0}(collision)$ had and more, meaning it is strictly better and for the purpose of "predicting" if there is going to be a collision. This means you can forget $P_{t=0}(collision)$. Or in general you only need to look at the probabilities going forward from the last point in time and at some point there either was a collision or there's a $t^*$, such that $P_{t=t^*}(collision) = 0$ and there's never going to be one, which means the probability is 0 or 1. (Theoretically you could always have a remaining risk of crash, but...)

In conclusion, I think you are looking for something that doesn't realy exist. Situations like the one in your example have overall a 10% probability of leading to a collision, the observed trajectory had 0% probability of crashing, because it didn't. At it's worst the situation had a 30% probability of leading to a crash. There is no way to meaningfully summarize this into one probability.