Solved – Question about logistic regression

ecologylogisticregressionspatialtime series

I want to run a binary logistic regression to model the presence or absence of conflict (dependent variable) from a set of independent variables over a 10 year period (1997-2006), with each year having 107 observations. My independents are:

land degradation (categorical for 2 types of degradation);
population increase (0- no; 1-yes);
livelihood type (0 – type one; 1 – type two);
population density (three levels of density);
NDVI continuous (max. veg productivity);
NDVI$_{t-1}$ (decline in veg from the previous year – 0 – no; 1 -yes) and
and NDVI$_{t-2}$ (decline in veg from two years past – 0- no; 1- yes).

I am fairly new to it all – this is a project my lecturer has given me – and so I would be grateful of some advice or guidance. I have tested for multicolliniarity already.

Essentially my data is split up into 107 units of observation (spatial regions) covering 10 years (1070 in total) and for every unit of observation it gives be a 'snapshot' value of conditions of the independent variables at that time within that unit (region). I want to know how to set up my logistic regression (or table) to recognize the 107 values of each year separately so that the temporal NDVI changes between different unit years can be assessed?

Best Answer

This is actually an extremely sophisticated problem and a tough ask from your lecturer!

In terms of how you organise your data, a 1070 x 10 rectangle is fine. For example, in R:

> conflict.data <- data.frame(
+ confl = sample(0:1, 1070, replace=T),
+ country = factor(rep(1:107,10)),
+ period = factor(rep(1:10, rep(107,10))),
+ landdeg = sample(c("Type1", "Type2"), 1070, replace=T),
+ popincrease = sample(0:1, 1070, replace=T),
+ liveli =sample(0:1, 1070, replace=T),
+ popden = sample(c("Low", "Med", "High"), 1070, replace=T),
+ NDVI = rnorm(1070,100,10),
+ NDVIdecl1 = sample(0:1, 1070, replace=T),
+ NDVIdecl2 = sample(0:1, 1070, replace=T))
> head(conflict.data)
  confl country period landdeg popincrease liveli popden     NDVI NDVIdecl1 NDVIdecl2
1     1       1      1   Type1           1      0    Low 113.4744         0         1
2     1       2      1   Type2           1      1   High 103.2979         0         0
3     0       3      1   Type2           1      1    Med 109.1200         1         1
4     1       4      1   Type2           0      1    Low 112.1574         1         0
5     0       5      1   Type1           0      0   High 109.9875         0         1
6     1       6      1   Type1           1      0    Low 109.2785         0         0
> summary(conflict.data)
     confl           country         period     landdeg     popincrease         liveli        popden         NDVI          NDVIdecl1        NDVIdecl2     
 Min.   :0.0000   1      :  10   1      :107   Type1:535   Min.   :0.0000   Min.   :0.0000   High:361   Min.   : 68.71   Min.   :0.0000   Min.   :0.0000  
 1st Qu.:0.0000   2      :  10   2      :107   Type2:535   1st Qu.:0.0000   1st Qu.:0.0000   Low :340   1st Qu.: 93.25   1st Qu.:0.0000   1st Qu.:0.0000  
 Median :1.0000   3      :  10   3      :107               Median :1.0000   Median :1.0000   Med :369   Median : 99.65   Median :1.0000   Median :0.0000  
 Mean   :0.5009   4      :  10   4      :107               Mean   :0.5028   Mean   :0.5056              Mean   : 99.84   Mean   :0.5121   Mean   :0.4888  
 3rd Qu.:1.0000   5      :  10   5      :107               3rd Qu.:1.0000   3rd Qu.:1.0000              3rd Qu.:106.99   3rd Qu.:1.0000   3rd Qu.:1.0000  
 Max.   :1.0000   6      :  10   6      :107               Max.   :1.0000   Max.   :1.0000              Max.   :130.13   Max.   :1.0000   Max.   :1.0000  
                  (Other):1010   (Other):428                                                                                                              
> dim(conflict.data)
[1] 1070   10

For fitting a model, the glm() function as @gui11aume suggests will do the basics...

mod <- glm(confl~., family="binomial", data=conflict.data)
anova(mod)

... but this has the problem that it treats "country" (I'm assuming you have country as your 107 units) as a fixed effect, whereas a random effect is more appropriate. It also treats period as a simple factor, no autocorrelation allowed.

You can address the first problem with a generalized linear mixed effects model as in eg Bates et al's lme4 package in R. There's a nice introduction to some aspects of this here. Something like

library(lme4)
mod2 <- lmer(confl ~ landdeg + popincrease + liveli + popden + 
    NDVI + NDVIdecl1 + NDVIdecl2 + (1|country) +(1|period), family=binomial,
    data=conflict.data)
summary(mod2)

would be a step forward.

Now your last remaining problem is autocorrelation across your 10 periods. Basically, your 10 data points on each country aren't worth as much as if they were 10 randomly chosen independent and identicall distributed points. I'm not aware of a widely available software solution to autocorrelation in the residuals of a multilevel model with a non-Normal response. Certainly it isn't implemented in lme4. Others may know more than me.

Best Answer

Related Solutions

Solved – Difference between Cox regression and logistic regression; question about correlation assessment

Solved – Question about Sigmoid Function in Logistic Regression

Related Question