You need to use your ACF & PACF behaviours to help you determine which model suits your data better (e.g. an existence of slow decay in ACF plot indicates that differencing might be needed to make the series more stabilized. Your ACF plot obviously shows that some sort of transformation is needed. The fluctuation has to be less varied and within the blue lines if you use the right transformation (stationary series). Once you made your series stationary, think about which model AR, MA, ARMA, or ARIMA is appropriate. In my project I did the following to help in model selection:
The ACF plot shows a relatively large value at lag 2 (see where this is in your plot). Apart from that it becomes essentially zero at lags greater than two. This suggests that a MA(2) model may fit the data and then by looking at the PACF plot we immediately notice that the correlation is zero almost at all lags. This may suggest that the model does not have any AR part on it (adjust this to your plot). Therefore, one of our candidate models could be an ARIMA (p, d, q) with parameters p=0, d=1, and q=1 or2. I also tried some higher orders of MA and considered some possibility for having an AR part in the model in order to compare the results from AIC, AICc, and BIC and decide on the final model. next step you'll need to run some diagnostic tests to make sure you've chosen the correct model and there is no pattern in your residuals (ACF & PACF for residuals, p-value for Ljung-Box statistic, histyogram for residuals, and Q-Q plot).
Hope it helps!
Before tackling correlation of measurement values, it may be worthwhile to explore the correlation of measurement presence. In other words, there may be information to extract from measurement co-occurence. For instance, it is possible that you measure temperature more often when you eat sweets. If you discover it is true, you can isolate time periods where you have eaten sweets and analyze your temperature during those times.
Here is where I would start, given the nature of your dataset and the sort of exploration you are trying to do.
- Decide on an appropriate window size, say, 10 minutes. Create time intervals corresponding to these windows, starting from the timestamp of your first measurement and ending at the timestamp of your final measurement. Construct a pandas dataframe, where each row corresponds to a time interval, and each column to a measurement type ("feature").
- For each interval, compute the average (mean or median) value for each feature corresponding to that interval. If there was no measurement for a certain type during the interval, enter "NaN" for that column. You may also want to record other information. For instance, if you took 3 temperature measurements during a time interval, that could become another feature for you to analyze. (see example in edit below)*
- Make a scatter plot where time intervals are along the x axis, and feature values (the averages from step #2) are the y axis. Each feature should be a different color dot. If the feature is NaN for a time interval, don't plot it. Also, mind your y axis scaling, as it may be hard to visualize the data without doing some normalization first. This sort of plot will give you a first look at all of your features at once. There may be interesting trends that you can hone in on. Or, it may still be too difficult to visualize. That's ok, we are exploring!
- Use a tool like missingno to analyze your data completeness. There is a lot of cool stuff in this package, and I am not too familiar with it, so I will leave you to explore its possibilities. Personally, I'd take a look at
missingno.matrix
, missingno.heatmap
, and missingno.dendrogram
By that stage, you may have already observed interesting trends. Keep exploring! You don't necessarily need to correlate the time series themselves to uncover interesting stuff.
If you are really interested in computing similarity between time series with different scales, look into dynamic time warping. If you computed the pairwise DTW similarity between all of your features, you may be able to infer which features tend to go together. I have seen this applied to financial data to analyze stocks that trend together. However, DTW doesn't solve the problem of missing data. For that, you will want to look into data imputation. But keep in mind, no amount of imputation can create data for you! It can only fill gaps based on what you think belongs there.
EDIT
You asked for clarification on the averaging process in step 2. Here is an example with body temperature:
Assume you are interested in time intervals of width $w$.
From time $t_j$ to time $t_{j+1}=t_j+w$, let's say you have measured your temperature $T$ a total of $n_j$ times: $T_1, \ldots, T_{n_j}$.
Then, instead of plotting a point for each of the $n_j$ temperature measurements, plot a single point corresponding to the mean temperature $\bar{T}_j=1/{n_j}\sum_{i=1}^{n_j} T_i$.
The goal is simply to create a plot with less clutter.
For categorical variables, it would make more sense to plot the median than the mean.
By reducing the fidelity of your x-axis to have fewer points, the plot may be easier to look at. But you are dealing with more features than I expected, so the utility of this approach is limited. I would play around with missingno
some more --- understanding the feature co-occurrence may be the first step in understanding cause and effect relationships between features. Good luck!
Best Answer
It sounds like a good approach would be time-series Granger Causality (as mentioned in the comments above). This uses a regression-based approach. Your end result isn't an ontological test of causality, but testing whether one variable helps you predict the other--which is generally what you get from regression. Several time-series texts can help you with this. For example, a new text by Steffensmeier, et al (Time Series Analysis for the Social Sciences, p. 112+) describe this in the context of vector autoregression modelling the same as Enders (Applied Economic Time Series, p. 305+, 4th ed). You are using a simple F-test to (from Steffensmeier), "determine the joint statistical significance of the coefficients on the lags of the variable hypothesized to Granger cause another variable. The null of no Granger causality is equivalent to the hypothesis that all these coefficients are jointly zero" (p. 112).
Enders has a detailed example (310+), using his own research of terrorism. He walks you through the process of creating VAR equations. Much of this process presumes a familiarity with time-series techniques--for example, testing for stationarity, etc, before you can proceed to the actual question you want to answer, i.e., causality. There is also a presumption of some background in linear algebra, since in the example, you need to do a Choleski decomposition. In this case, causality is determined using an impulse response function.