I want to run a binary logistic regression to model the presence or absence of conflict (dependent variable) from a set of independent variables over a 10 year period (1997-2006), with each year having 107 observations. My independents are:
- land degradation (categorical for 2 types of degradation);
- population increase (0- no; 1-yes);
- livelihood type (0 – type one; 1 – type two);
- population density (three levels of density);
- NDVI continuous (max. veg productivity);
- NDVI$_{t-1}$ (decline in veg from the previous year – 0 – no; 1 -yes) and
- and NDVI$_{t-2}$ (decline in veg from two years past – 0- no; 1- yes).
I am fairly new to it all – this is a project my lecturer has given me – and so I would be grateful of some advice or guidance. I have tested for multicolliniarity already.
Essentially my data is split up into 107 units of observation (spatial regions) covering 10 years (1070 in total) and for every unit of observation it gives be a 'snapshot' value of conditions of the independent variables at that time within that unit (region). I want to know how to set up my logistic regression (or table) to recognize the 107 values of each year separately so that the temporal NDVI changes between different unit years can be assessed?
Best Answer
This is actually an extremely sophisticated problem and a tough ask from your lecturer!
In terms of how you organise your data, a 1070 x 10 rectangle is fine. For example, in R:
For fitting a model, the glm() function as @gui11aume suggests will do the basics...
... but this has the problem that it treats "country" (I'm assuming you have country as your 107 units) as a fixed effect, whereas a random effect is more appropriate. It also treats period as a simple factor, no autocorrelation allowed.
You can address the first problem with a generalized linear mixed effects model as in eg Bates et al's lme4 package in R. There's a nice introduction to some aspects of this here. Something like
would be a step forward.
Now your last remaining problem is autocorrelation across your 10 periods. Basically, your 10 data points on each country aren't worth as much as if they were 10 randomly chosen independent and identicall distributed points. I'm not aware of a widely available software solution to autocorrelation in the residuals of a multilevel model with a non-Normal response. Certainly it isn't implemented in lme4. Others may know more than me.