Solved – Identifying the time lag between cause and effect

causalitypanel datarregressiontime series

What approaches exist to observe the time lag between two variables?

I need to analyze the relationship between blood pressure and some other factor, such as exercise. The data set I am drawing from has around 1800 individuals, with an average of 100 entries a piece. It is generally known that there is a strong relationship between exercise level and blood pressure. However, if a person increases their steps to 8000+ a day, how long will it take for their blood pressure to drop? I am new to this type of analysis, and this is a challenge I have been thinking about for weeks.

I don't know if anyone wants to comment on possible approaches to this challenge or any issues surrounding it.

Some issues I have been dealing with:

  1. Is it better to treat this as a times series analysis or longitudinal data analysis?

    My understanding is that time series usually focuses on one variable with no missing data and is observed at consistent intervals, where as longitudinal is over a longer period and has inconsistent time intervals, dropouts, and missing data.

    The data I have seems to fit the longitudinal description more, but it also seems like time series could be used if I averaged the values by week so there would be no missing entries. I'm not sure about the pros and cons of each approach.

  2. Should I be fitting a causal model, or would some other method like regression be more helpful?

    I've been looking at various possible causal models, for example Marginal Structural Models (MSM) or Structural Nested Models (SNM), but there seem to be very little information on their application. I did find one R package that applied inverse probability weights and then used Cox proportional hazards regression model on a survival object (MSM), but that seemed to be focus on weighting for confounding and right censoring. Its result was a correlation coefficient, which I don't think helps me.

    So I'm not sure if fitting a causal model is what I want, because that seems to be more focused on the making intellectually satisfying assumptions about relationships within the data and then determining the degree of causality, rather than providing information about time lag.

    If anyone knows about MSM, SNM, their use in R, or how they might relate to this problem, that would be awesome to hear.

  3. What about survival analysis or SEM?

    I haven't explored these options very in-depth yet but they sound potentially relevant.

I've kind of stalled, so any hints about what direction I might want to go would be really appreciated.

Thanks in advance.

Best Answer

When comparing two time series for lead/lag relationships we estimate the cross-correlation function. A statistically significantly high value of this at a particular lag may be indicative of a relationship like the one you are looking for.