I am performing distributed non-linear lag models in R.
I got the figure result of dlnm
as shown in the vignette (pdf) on page 13:
The X-axis is lag, which I can understand. However, I cannot understand what the label for the Y-axis means. For the figure on page 13, the Y-axis is called "RR" (relative risk) which is positive; however, I got a negative RR when I input my data. My $X$ is temperature, my $Y$ is pollutant concentration. Can I still use the model?
Could anybody explain it to me a little bit?
The R code that I am using is:
OrigData <- read.csv("Hourly data for use SHULE-2.csv", na.strings = ".",
header=T, sep=",")
CO2H6 = OrigData$CO2H6 # $
OutT = OrigData$OutT # $
Year = OrigData$Year # $
Month = OrigData$Month
argvar <- list(type="ns", df=11)
arglag <- list(type="ns", df=5)
suppressWarnings(cb1 <-crossbasis(OutT, lag=12, argvar=argvar, arglag=arglag))
model1 <- glm(CO2H6~ cb1 +as.factor(Month))
summary(model1)
cp <- crosspred(cb1,model1,-22.83:34.79,by=1, cumul=TRUE)
par(cex.axis=0.7,cex.lab=0.7)
d3 <- plot(cp, xlab="Delay Effects of OutT in IN2H-H6", zlab="CO2 con.", r=90,
d=0.3, col="red" , xlab="Out T")
plot(cp, "slices", var=10, ci="n", ylim= c(-1000,1000), ylab="CO2 con.", lwd=1.0,
xlim=c(0,10))
for(i in 4:6){
lines(cp, "slices", var=c(-20,-10,0,10,20,30)[i], col=i+1, lwd=1.5)
}
legend(7,1000, col=5:7, lwd=0.9, pch=1:1, legend=c("OutT =10℃", "OutT =20℃",
"OutT =30℃"), cex=0.7)
Since RR cannot be negative, could there be a way for me to transfer my x(Temperature) and y (Pollutant con.) to make it fitting the requirement of the model?
Thanks for all your help!!
Best Answer
Interpretation of the graph in your case
Note: The y-axis is not always the relative risk as in the example given in the vignette of the
dlnm
package. This is only the case in their example, because they used mortality data and Poisson regression models. In their framework, the exponentiated regression coefficient from the Poisson models $RR=\exp(\hat{\beta})$ is the relative risk. This is analogous to exponentiating the regression coefficients in logistic regression, which is the odds ratio.Yeah, you can still use such a model.
Let's summarize what you do:
arglag
with optiontype="ns"
instead oftype="poly"
)argvar
as splines. One important thing you have to know for the interpretation of the plots is that the functioncrossbasis
automatically centeres the values at the predictor mean (i.e. the mean temperature) if not specified otherwise. This is the reference value with which the predictions are later compared in the graphics.lag=12
incrossbasis
). (Btw: Why do you suppress the warnings?)lm
function instead.The plot that you have provided is interpreted as follows: The x-axis is the lag.
Interpretation of the values of the y-axis: The y-axis depicts the changes in $\mathrm{CO}_{2}$ concentration associated with an increase of 10, 20 or 30°C compared to the mean temperature. If predicted change is 0, this means that an increase in temperature is not associated with an increase in $\mathrm{CO}_{2}$ concentration compared to $\mathrm{CO}_{2}$ concentration at mean temperature: The predicted $\mathrm{CO}_{2}$ concentration is the same at $\bar{x}_{Temp}+z$ degrees (where $z$ is any amount, say 10 or 20 degrees) and at mean temperature $\bar{x}_{Temp}$.
This means that for an increase in temperature of 10°C, the temperature at lag 0 (on the same hour) increases the $\mathrm{CO}_{2}$ concentration compared to the mean temperature. Because you specified
cumul=TRUE
incrosspred
, the effects are cumulative. The cumulative effects of an increase of 10°C are quasi nonexistent after 4 hours compared to the mean temperature. This suggests that the non-cumulative effects are negative at lags 1-4 and null effects from that on.For temperature increases of 20 or 30°C, the cumulative effects on the $\mathrm{CO}_{2}$ concentration are lower in the first 1-4 hours compared to $\mathrm{CO}_{2}$ at the mean temperature. As with temperature increases of 10°C, the cumulative effects are practically nonexistent after 4 or 5 hours. Again: $\mathrm{CO}_{2}$ concentrations are the same at mean temperature and at an increase in temperature of 20 or 30°C after 4 or 5 hours.
I think a contour plot would be easier to interpret. Try the following code:
Interpretation of the example given in the vignette of the dlnm packge
First, a little something about distributed lag models. They have the form:
$$ Y_{i}=\alpha + \sum_{l=0}^{K}\beta_{j}x_{t-l} + \text{other predictors} +\epsilon_{i} $$ where $K$ is the maximum lag and $x$ is a predictor. This is just fitted using a multiple linear regression. So the coefficient $\beta_{1}$ would estimate the effect of $x_{t-1}$ of the day before on $Y_{t}$. In essence, multiple lags of the predictors are included in the model simultaneously. This obviously has the problem that the lagged predictors are highly correlated (autocorrelation).
A more advanced method are polynomial distributed lag models. It has the same basic formula as above, but the impulse-response function is forced to lie on a polynomial of degree $q$ (link to a paper for Stata):
$$ \beta_{i} = a_{0} + a_{1}i + a_{2}i^2 +\ldots+a_{q}i^q $$ where $q$ is the degree of the polynomial and $i$ the lag length. Another formulation is $$ \beta_{i} = a_{0} + \sum_{j=1}^{q}a_{j}f_{j}(i) $$
Where $f_{j}(i)$ is a polynomial of degree $j$ in the lag length $i$. A good introduction to the
dlnm
package and polynomial distributed lag models can be found here.These models are often used in studies about air-pollution and health because air-pollution has lagged effects on health outcomes.
Let's look at this graph from the vignette of the
dlnm
package (page 13):The degree of the polynomials was $q=4$ in this case so the green line is a polynomial of 4th degree. The y-axis is the relative risk (RR) estimated via Poisson regression and the x-axis the considered lag. The relative risk has the following interpretation: Persons who were exposed have a $(RR-1)\cdot100\%$ higher/lower chance of getting the outcome (e.g. death, lung cancer, etc.) compared to people who were not exposed. If $RR>1$ this means a positive association and if $RR<1$ means a protective association. A $RR=1$ means no association. We see that for every increase of $\textrm{PM}_{10}$ by 10 units ($\mu \mathrm{g}/m^{3}$), there is a $(1.001-1)\cdot100\%=0.1\%$ increase in the risk to die at lag 0 (i.e. on the same day as the exposure). Strangely, the exposure from about 9 days ago is protective: an increase of 10$~\mu \mathrm{g}/m^{3}$ is associated with a decreased risk to die compared to people with 10 units less exposure. We can also see that the exposure from 15 days before doesn't play a role (i.e. $RR\approx1$).
Let's look at the cumulative relative risk:
This is the same as before but the effects are cumulated over time (i.e. summing all contributions from the lags up to the maximum lag). The red line starts at the same point as the green line in the first graphic (i.e. $\approx1.001$). We can see that people who have been exposed for five days have an increased cumulative risk of about $(1.005-1)\cdot100\%=0.5\%$ to die compared to non-exposed people. Because the green line goes below the relative risk of $1$ after a lag of about 5 days, the cumulative association after 15 days is nearly $1$. This means that the protective effects of $\textrm{PM}_{10}$ from lag 5 on have compensated the harmful effects from earlier lags. Whether that is scientifically reasonable is another quesiton.