Solved – regression with non-independent data

dyadic-datamixed modelnon-independentrandom-effects-modelregression

I will be performing regression on subjects total scores from 2 player games (prisoners dilemma) that they will be playing. I am aware that including both players score from a game will cause problems due to non-independence. Is there a way that to deal with this apart from randomly picking one subject from each game for the analysis (and so losing half the data). Is there a way to introduce this into the model instead, perhaps as a random effect?

Best Answer

I assume what you have in mind is score as the response and then some player attributes as the predictors. E.g find out if blonds score higher.

Why not perform the regression with game as your sample unit. A game of N points must distribute those points between A and B so you can just take player A score for each game as a binomial response and then include both players attributes as predictors.

Related Solutions

Solved – Interpreting Reaction Time data with mixed-effects model

As a general rule, including a random subject effect in a repeated measures/mixed model pulls up the within subject effects and dampens between subject effects. In other words, if you fail to include a person effect that should exist, you are likely to get spurious significance of between subject effects and spurious non-significance of within subject effects.

Let's look at what happens with Person and Condition.

The mixed model assumes that each person has a "person effect" which we don't see. It comes from a normal distribution with mean 0. Ignoring pictures, trials and everything else, the total for condition 1 is the total of the person effects for persons receiving that condition + the estimated effect of condition 1 + the estimated intercept term (or something close to that. Your design isn't balanced, so the totals may not be exact. But that's the gist of what's happening). In a mixed model, the random effects (which get estimated during execution of the EM algorithm) are not constrained to sum to 0 over each condition, even though their theoretical mean is 0. Suppose the sum of the random effects of condition 1 just happened to be larger than the sum of the random effects of condition 2 -- and suppose also that the total over condition 1 is greater than the total over condition 2 ..... then in that case, the random effect is stealing from the fixed effect. In other words, if you remove the random effect, the full difference between condition 1 and condition 2 is explained by the condition effect. When you include random effects, some of the observed difference would be explained by the persons, at the expense of the condition. This is because Condition is a between subjects effect.

Now in the case of a large sample (large number of persons), and where the condition effect was meaningfully larger than the person effect, you wouldn't get this paradox. The person effects will likely cancel out over the condition, and the large condition effect will come shining through. However, your condition effect is smaller than the standard deviation of the person effect.

Furthermore, you don't say how many subjects you have, but I'm guessing it's not large. That means that the cumulative random person effects don't have the numbers they need to average out close to 0 within each condition, which means they will get in the way of estimating the condition effect.

You can ask R to produce the estimated random person effects and do a boxplot of these against Condition. If the mean (or median) person effects are pulling in the same direction as the condition difference, you are open to the paradox you mention.

I'm not sure what you should do here. To me, the concern is that all your effects are small. The variance of the person effect is around the same size as the condition effect. The variance of the picture effect is very small, and that variable should probably be dropped. Your optinfo looks good, so at least the model converged. But the biggest effects you have are gender and residual variance. In other words, people differ from each other; men differ from women.

You can test for random effects using package RLRsim. You can't do a Wald t-test on them because if indeed the variance of a random effect is 0, then your parameter is on the boundary of the parameter space and maximum likelihood asymptotics break down. RLRsim brute forces the issue through simulation. This will indicate whether you should drop the picture effect. I don't like dropping the person effect, since I think you only want to infer the relevance of experimental effects that are stronger than random person to person variation. You have a repeated measures design and you should honour that in the analysis.

I also have doubts about using the reciprocal of reaction time, unless you have strong theoretical grounds for holding that the 1/RT is linear in all that stuff. All of your parameters seem to be fairly close to 0 compared to the residual variance - your estimated (non-significant) intercept is even negative - which doesn't help interpretability. At least not for me.

As to your question about estimation, the random effects are not "conditional on having estimated the fixed effects". The likelihood is maximized over all parameters -- the so-called unseen random effects and the fixed effects. Each parameter estimate is made in the presence of the others.

Best Answer

Related Solutions

Solved – Interpreting Reaction Time data with mixed-effects model

Related Question