Solved – Checking for outliers in a glmer (lme4 package) with 3 random factors

cooks-distancelme4-nlmeoutliersr

I have a question relating to the checking for outliers and / or influential points in my dataset using a glmer model with 3 random variables. I'm investigating the detection rate (SumDetections) of receivers over increasing distance (sc.c.distance), and the effect of environmental influences on this (depth, temperature and wind) and how this differs between different transmitters used, controlling for random effects of receiver ID, replicate and area. I found that the influence.ME might be of help, so I checked it out.

In the manual I read that this package is only able to delete levels of 1 single grouping factor or 1 data point per time over the whole data set. Unless I read the package info incorrectly, this package cannot do what I'm looking for. I'm looking for a way to check for outliers nested within 4 grouping layers.

How my data is organized is as follows: First my data discerns between Areas. Within areas, multiple replicates were done. Each replicate consisted of 5 distances at which the detection rate was tested. For each distance, 20 receivers were tested.

My model looks like this:

m <- lmer(SumDetections ~ tm + sc.c.distance + tm:sc.c.distance + c.tm.depth + 
             c.receiver.depth + c.temp + c.wind + (1|replicate) + (1|SUR.ID) + (1|Area), 
             data = df3, family = poisson)

My questions are:

Is it possible to check for outliers of which the data is nested within 3 layers with use of influence.me?
If so, how should I specify the command to get what I'm looking for, and how should I interpret the returned data by the Cook's distance or dfbetas?
If not, is there another package that allows me to check for outliers?

Best Answer

try the romr.fnc in the LMERConvenienceFunctions to remove outliers

df3.trimmed = romr.fnc(m, df3, trim = 2.5)
df3.trimmed = df3.trimmed$data

update initial model on trimmed data

mB = update(m1)

Related Solutions

Solved – Dealing with missing data – glmer in lme4 package

Turning my comments into an answer as they seem to have answered your question...

Just exclude the actual missing data. If you format your data with columns ID, Day, environmental variables, response, everything should be fine to just omit the rows where an ID is missing a measurement on a certain day, still keeping the other measurements on those IDs.

For inference, you'll get the best accuracy using bootstrapped estimates, (lme4::confint() with method = "boot" works well -- you'll need to install the boot package for this to work). If you want more info on that, I'd recommend Faraway's Extending the Linear Model with R, section 8.2. The lme4 package has been considerably updated since Faraway's book's printing, you can see the accompanying transition guide. The principles, of course, remain the same.

Time Series – Detecting Outliers in Time Series (LS/AO/TC) Using tsoutliers Package in R. How to Represent Outliers in Equation Format?

The temporary change, TC, is a general type of outlier. The equation given in the documentation of the package and that you wrote is the equation that describes the dynamics of this type of outlier. You can generate it by means of the function filter as shown below. It is illuminating to display it for several values of delta. For $\delta=0$ the TC collapses in an additive outlier; on the other extreme, $\delta=1$, the TC is like a level shift.

tc <- rep(0, 50)
tc[20] <- 1
tc1 <- filter(tc, filter = 0, method = "recursive")
tc2 <- filter(tc, filter = 0.3, method = "recursive")
tc3 <- filter(tc, filter = 0.7, method = "recursive")
tc4 <- filter(tc, filter = 1, method = "recursive")
par(mfrow = c(2,2))
plot(tc1, main = "TC delta = 0")
plot(tc2, main = "TC delta = 0.3")
plot(tc3, main = "TC delta = 0.7")
plot(tc4, main = "TC delta = 1", type = "s")

tremporary change

In your example, you can use the function outliers.effects to represent the effects of the detected outliers on the observed series:

# unit impulse
m1 <- ts(outliers.effects(outlier.chicken$outliers, n = length(chicken), weights = FALSE))
tsp(m1) <- tsp(chicken)
# weighted by the estimated coefficients
m2 <- ts(outliers.effects(outlier.chicken$outliers, n = length(chicken), weights = TRUE))
tsp(m2) <- tsp(chicken)

The innovational outlier, IO, is more peculiar. Contrary to the other types of outliers considered in tsoutliers, the effect of the IO depends on the selected model and on the parameter estimates. This fact can be troublesome in series with many outliers. In the first iterations of the algorithm (where the effect of some of the outliers may not have been detected and adjusted) the quality of the estimates of the ARIMA model may not be good enough as to accurately define the IO. Moreover, as the algorithm makes progress a new ARIMA model may be selected. Thus, it is possible to detect an IO at a preliminary stage with an ARIMA model but eventually its dynamic is defined by another ARIMA model chosen in the last stage.

In this document (1) it is shown that, in some circumstances, the influence of an IO may increase as the date of its occurrence becomes more distant into the past, which is something hard to interpret or assume.

The IO has an interesting potential since it may capture seasonal outliers. The other types of outliers considered in tsoutlierscannot capture seasonal patterns. Nevertheless, in some cases it may be better to search for a possible seasonal level shifts, SLS, instead of IO (as shown in the document mentioned before).

The IO has an appealing interpretation. It is sometimes understood as an additive outlier that affects the disturbance term and then propagates in the series according to the dynamic of the ARIMA model. In this sense, the IO is like an additive outlier, both of them affect a single observation but the IO is an impulse in the disturbance term while the AO is an impulse added directly to the values generated by the ARIMA model or the data generating process. Whether outliers affect the innovations or are outside the disturbance term may be a matter of discussion.

In the previous reference you may find some examples of real data where IO are detected.

(1) Seasonal outliers in time series. Regina Kaiser and Agustín Maravall. Document 20.II.2001.

Best Answer

Related Solutions

Solved – Dealing with missing data – glmer in lme4 package

Time Series – Detecting Outliers in Time Series (LS/AO/TC) Using tsoutliers Package in R. How to Represent Outliers in Equation Format?

Related Question