Solved – How to machine learning models (GBM, NN etc.) be used for survival analysis

classificationcox-modelkaplan-meiermachine learningsurvival

I know that traditional statistical models like Cox Proportional Hazards regression & some Kaplan-Meier models can be used to predict days till next occurrence of an event say failure etc. i.e Survival analysis

Questions

  1. How can regression version of machine learning models like GBM, Neural networks etc be used to predict days till occurrence of an event?
  2. I believe just using days till occurence as target variable and simplying running a regression model will not work? Why wont it work & how can it be fixed?
  3. Can we convert the survival analysis problem to a classification and then obtain survival probabilities? If then how to create the binary target variable?
  4. What is the pros & cons of machine learning approach vs Cox Proportional Hazards regression & Kaplan-Meier models etc?

Imagine sample input data is of the below format

enter image description here

Note:

  • The sensor pings the data at intervals of 10 mins but at times data can be missing due to network issue etc as represented by the row with NA.
  • var1,var2,var3 are the predictors, explanatory variables.
  • failure_flag tells whether the machine failed or not.
  • We have last 6 months data at every 10 min interval for each of the machine ids

EDIT:

Expected output prediction should be in the below format
enter image description here

Note: I want to predict the probability of failure for each of the machines for the next 30 days at daily level.

Best Answer

For the case of neural networks, this is a promising approach: WTTE-RNN - Less hacky churn prediction.

The essence of this method is to use a Recurrent Neural Network to predict parameters of a Weibull distribution at each time-step and optimize the network using a loss function that takes censoring into account.

The author also released his implementation on Github.

Related Question