You basically have to create a wide format dataset with the all the characteristics that are relevant for the matching procedure, perform the matching on this cross-sectional dataset, and then use the ID to identify the matched pair in the panel dataset. Here are some more details:
Use reshape
to create a wide format dataset. Format the pre-treatment variables in the way you want to use them in the matching procedure. You can just take the average of your variables if you have multiple observations for one individual but you can also come up with other ways (you can also keep multiple observations of the same variables such as health1, health2 and use all of them in the matching). The goal is to have a dataset with one observation per individual.
Using this dataset, perform the matching procedure with psmatch2
.
Merge the information about the matched cases with the original dataset. Drop cases that are not matched etc. I am not sure about the details here because I don't really know stata and psmatch2
but I think you get the idea.
Using these steps, you can match cases based on all pre-treatment information and you only have one match per treatment unit.
All matching estimators for the treatment on the treated effect can be written in the form
$$ \frac{1}{n_T} \sum_{i \in \{d_i=1\}} \left[ y_{1i} - \sum_{j \in \{d_j = 0 \}} w_{ij} \cdot y_{0j} \right] ,$$
where $w_{ij}$ is the weight placed on the $j$th untreated observation as a counterfactual for the $i$th treated observation, and $n_T$ is the number of treated persons. The weights satisfy $\sum_j w_{ij}=1$ for all $i$.
Effectively, from each treated observation $i$, you subtract a weighted average of the control observations. Then you take the average of these differences. These weights are specific to observation $i$. Different matching estimators differ in how they construct the weights.
For example, nearest neighbor matching sets the weight to 1 for the single untreated observations closest to $i$ in terms of the propensity score and to 0 for all others. k-NN uses $k$ closest neighbors instead.
Interval matching consists of dividing the range of propensity scores into a fixed number of intervals (which need not be of equal length). An interval-specific estimate is obtained by taking the difference between the mean outcomes of the treated and untreated units in each interval.
Radius/caliper matching takes the mean of the outcomes for untreated units within a fixed radius of each treated unit as the estimated expected counterfactual. You pick the radius.
Kernel matching uses weights that decline with the PS distance. You can think about kernel matching as running a weighted regression for each treated observation using the comparison group data and the regression includes only an intercept term. Here you have to pick the kernel and the bandwidth. Larger bandwidth means further observations will have larger weights.
Local linear matching is very similar, but also included a linear term in PS. Some people will also include higher order polynomial terms.
Finally, you have inverse probability weighting. The basic idea is that you can figure out the expected untreated outcome (in either the treated population or the full population) by reweighting the observed values using the treatment probabilities.
There are some guidelines about how to pick a method here.
There is a list of software and packages that can do matching here. Stata also now has native PSM estimators. In my experience, replicating the output by hand is often very hard once you go past the simplest estimators. However, you can also find examples with output for all of these online, so even if you don't have the software, they will give you a useful benchmark since you can usually track down the data.
Best Answer
Maybe the following paper is relevant for your case: Lu B. Propensity Score Matching with Time-Dependent Covariates. Biometrics 2005; 61, 721–728.
In the situation considered in the paper, subjects may start treatment at any point during an observation period. An individuals who becomes exposed at time $t$ is matched to several controls selected from the corresponding risk-set, i.e. from all subjects who are still at risk of becoming exposed at time $t$.
Matching is with respect to a time-dependent propensity score, defined as the hazard of becoming exposed at time $t$ computed from a Cox proportional hazards model: $$h(t)=h_0(t)\exp(\beta'x(t))$$ where $x(t)$ is a vector of potentially time-varying predictors of treatment status. In each risk-set, matching is actually perfomed on the linear predictor scale according to the metric $$d(x_i(t),x_j(t))=\left(\hat\beta'x_i(t)-\hat\beta'x_j(t)\right)^2.$$