Solved – Is it appropriate to use “time” as a causal variable in a DAG

causalitydagphilosophical

This question might be better suited for philosophy.SE, but I will post it here in the first instance, since it involves technical aspects that are best understood by users on this site. The title question asks, is it appropriate to use "time" as a causal variable in a DAG? More specifically, if we have data over different time periods, is it appropriate to use the time index as a variable in the DAG, with causal arrows emanating from that variable to other variables?

To my mind, this raises the philosophical question of whether "time" can be considered to have a causal impact on other variables, or contrarily, whether time is an inbuilt component of the notion of causality to begin with (and so cannot be brought in as a causal variable). Beyond this philosophical question, it also raises practical statistical questions about the appropriate treatment of a time index in a DAG. In most statistical applications involving data measured over different times, there are confounding factors that also vary over time. In such cases, can one use "time" as a stand-in for other specification of confounding factors?

Best Answer

As a partial answer to this question, I am going to put forward an argument to the effect that time itself cannot be a proper causal variable, but it is legitimate to use a "time" variable that represents a particular state-of-nature occurring or existing over a specified period of time (which is actually a state variable). These issues are the impetus for the question itself, since my intuition tells me that "time" in a causal model must be a kind of proxy for some kind of state variable.


Time itself cannot be a causal variable

Time is already a component of the concept of causality: The first hurdle is the fact that the concept of of causality involves actions, and actions occur over time. Thus, "time" is already baked into the concept of causality. One might therefore regard it as a concept where time is a priori inadmissible as an argument variable in the concept. To assert that time is a cause of an effect requires time to be admitted both as the asserted causal variable, and also as a necessary concept for causality itself. (We will see more of the effects of this below.)

If time causes anything, it causes everything: The second hurdle is that causality is generally regarded as requiring a counterfactual condition that reduces to triviality in the case where time is asserted as the causal variable. If we say that "precondition X causes action Y", the relevant counterfactual condition is that (1) the presence/occurrence of precondition X means that action Y will occur; and (2) in the absence of another cause, the absence of precondition X means that action Y will not occur. But since "will occur" means "will occur over time", the use of a "time" as a causal variable adds nothing to the first requirement, and makes the second a tautology. If precondition X is "the movement of time" then (1) reduces to "the movement of time means that action Y will occur", which logically reduces to "action Y will occur"; and (2) reduces to "the absence of movement of time means that action Y will not occur" (which is a tautology, since action can only occur over time). Under this counterfactual interpretation of causality, an assertion of the time-causality of an action is logically equivalent to an assertion that this action will occur. Thus, we must either conclude that this condition is too weak to constitute causality (i.e., time is not a cause of anything) or that time is the cause of everything.

Pure time-causality is metaphysically equivalent to randomness: Another hurdle here occurs when we have a situation where "time" is the only asserted causal variable (i.e., in the case of pure time-causality). The problem is, if any change in a variable occurs over time, in the absence of causality from a non-time variable, this has traditionally been regarded as the very definition of aleatory randomness ---i.e., non-causality. Thus, to assert that time is the sole cause of an effect is to banish the notion of non-causality (randomness) entirely from metaphysics, and substitute it with a base "cause" that is always present if there is no other cause. Alternatively, one might reasonably assert that a claim of time-causality is equivalent to an assertion of randomness ---i.e., it is an assertion that there are no causes to the change, other than the passage of time. If such is the case, then the presence of "time" as a causal variable in a DAG is equivalent to its absence (and thus parsimony counsels that it be excluded). Moreover, the history of the field counsels in favour of keeping the existing terminology of "randomness".

Problems with causal calculus with time as a causal variable: Another final hurdle I will mention (there may be more) is that it is difficult to deal with "time" as a causal variable in the causal calculus. In standard causal calculus, we have a $\text{do}(\cdot)$ operator that operates on a causal variable to reflect intervention into the system to change that variable to a chosen value that may be different from what it would be under passive observation. It is not entirely clear that it is possible to impose an "intervention" for a time variable, without running afoul of other philosophical or statistical principles. One could certainly argue that waiting is an intervention that changes time (forward only), but even if this were so interpreted, it cannot be differentiated from passivity, and so arguably it would not be distinct from passive observation. One might instead argue that we could record a large amount of data over different times, and then the "intervention" would be to choose which time values are included in the data for the analysis. That would indeed involve a choice of time periods (over the available data), and so it would seem to constitute an intervention, but that is an epistemic intervention, not a metaphysical one. (It also gives rise to a secondary problem of failing to use all the available data.)


A state variable accruing over time can be a causal variable

DAGs can include variables representing states-of-nature occurring over a prescribed time: There are a number of legitimate causal variables that represent the occurrence of some state or some event over a prescribed period of time. A simple example (hat tip to Carlos in the answer below) is investment of money over time, which yields interest. In this case, the accrual of interest is caused by the fact that money is invested over a period of time, and the longer the investment period, the higher the interest accrued. In this case, it is legitimate to have a "time" variable, that represents the chosen period of time for the investment, and this variable would have a direct causal impact on the accrued interest. Similarly, the "age" variable for a person is a kind of "time" variable (hat tip to AdamO in the answer below), representing the fact that the person has been alive over a specified period of time. Each of these variables are legitimate causal variables that can be included in a DAG. These variables do not represent the progression of time itself --- they represent the fact that a certain state-of-nature was present over a specified period of time. In many cases, it is a useful shorthand to label a variable like this as "time", but it is important to bear in mind that it represents a specific state over a period of time, rather than the progression of time itself.

In some sense, every variable is of this kind: Since every possible event or state-of-nature occurs either at a particular point in time, or over a period of time, every variable involves some (often implicit) time specification. Nevertheless, there are variables such as "age" or "time invested" that have a more direct connection to time, insofar as the variable represents the amount of accrual of time during which a particular state obtained.

Using "time" in a DAG is a shorthand for a state variable accruing over time: If the above argument is correct, it would appear that any use of a "time" variable in a DAG must be a shorthand for a variable representing the occurrence of a particular event or the existence of a particular state-of-nature over a specified period of time. The progression of time itself is not subject to control or intervention, and cannot be a causal variable for the reasons described above. However, the prevalence of a particular state-of-nature over a period of time certainly can be a legitimate causal variable that can be included in a DAG.


These points give some basic idea of why the use of "time" as a causal variable is problematic, and what it means to add "time" to a DAG. As you can see, my view is that time itself cannot be a causal variable, but that you can have a "time" variable that actually represents an event or state-of-nature occurring or existing over a period of time. I am open to being convinced to the contrary, but this seems to me to be a sensible resolution of the issue.

Related Question