Solved – time series patterns for categorical variable

categorical datatime series

I have a large database, containing alarm data from a manufacturing cell. Each entry specifies the name of the alarm, and a matching identifying ID number. The alarm variable can only assume a number of pre-defined values, though that number of values is quite large.

I assumes this means that alarm name/alarm ID can be modeled as a categorical variable. Each entry also contains a timestamp.

I am somewhat new to statistical and machine learning modeling, but have been looking to answer the following two questions:

  1. If there is any pattern in the alarms, that is if there is any correlation between the previous alarm and the next one.
    As an extension of this, I would also like to know if certain alarms are likely to appear in sequential chains.

  2. I would like to know if certain alarms are more likely to appear at certain times of the day or at certain points during the week.

Perhaps someone here could help me with how to proceed?

Best Answer

1) You can use a time-window for your categorical variables; take not only the features of sample at time t, also take t-1, t-2,..... t-n samples as for each training sample. For example, if you have 12 features for each sample and you want your model to seek for possible patterns from 30 samples from the past, you should have 480 features for each sample at time t for your training, validation or test set. Categorical features are usually better than numeric ones on such tasks, so you seem to be safe.

2) From the time stamp data; you can extract useful features such as hour, day, month, year, week of the month, day of the week, whether it is daytime or whether it is a business day(weekend etc.) Those categorical features work usually great with time-sequence machine learning models.

Hope I could help.