Solved – Why would R return NA as a lm() coefficient

rregression

I am fitting an lm() model to a data set that includes indicators for the financial quarter (Q1, Q2, Q3, making Q4 a default). Using lm(Y~., data = data) I get a NA as the coefficient for Q3, and a warning that one variable was exclude because of singularities.

Do I need to add a Q4 column?

Best Answer

NA as a coefficient in a regression indicates that the variable in question is linearly related to the other variables. In your case, this means that $Q3 = a \times Q1 + b \times Q2 + c$ for some $a, b, c$. If this is the case, then there's no unique solution to the regression without dropping one of the variables. Adding $Q4$ is only going to make matters worse.

Related Solutions

Solved – Why does R’s lm() return different coefficient estimates than the textbook

It looks like the author made a mathematical error somewhere.

If you expand the sum-of-squares deviation

$$ S = ((b+m)-1)^2+ ((b+2m)-5)^2 + ((b+4m)-9)^2 $$ you get $$ \begin{split} S = & b^2+2 b m+ m^2 + 1 - 2 b - 2 m \\ + & b^2+4 b m+ 4 m^2 + 25 - 10 b -20 m \\ + & b^2+8 b m+16 m^2 + 81 - 18 b -72 m \end{split} $$

which reduces to $$ 3 b^2 + 14 b m + 21 m^2 + 107 - 30 b - 94 m $$ which is the same as the author's expression, except the constant term, which doesn't matter anyway).

Now we need to try to minimize this by setting the derivatives of $S$ with respect to $b$ and $m$ to zero and solving the system. $$ dS/db = 6 b + 14 m -30 \to 3 b +7 m-15 = 0 $$ $$ dS/dm = 14 b +42 m -94 \to 7 b + 21 m -47 = 0 $$

Solve

$$ \begin{split} b & = (15-7m)/3 \\ 0 & = 7 (15-7m)/3 + 21 m-47 \\ 47 - 35 & = (-49/3 + 21) m \\ m & = (47-35)/(21-49/3) = 18/7 \end{split} $$

R says this is indeed 2.571429 ...

Based on this link this seems to be from a Coursera course ... ? Maybe there was a mis-transcription of the data somewhere?

The other, independent way to do this calculation is to know that the estimated regression slope is equal to the sum of cross products ($\sum (y-\bar y) (x-\bar x)$) divided by the sum of squares ($\sum (x-\bar x)^2$).

g <- c(1,2,4)
g0 <- g - mean(g)
s <- c(1,5,9)
s0 <- s- mean(s)
sum(g0*s0)/(sum(g0^2))
## [1] 2.571429

If think if the shoe sizes were $\{1,11/3,9\}$ instead of $\{1,5,9\}$ then the slope would come out to 8/3 ...

Solved – Default lme4 optimizer requires lots of iterations for high-dimensional data

One thing you could try is to change the optimizer. See Ben Bolker's comment at this github issue. The nlopt implementation of bobyqa is usually much faster than the default (at least whenever I try it).

library(nloptr)
defaultControl <- list(algorithm="NLOPT_LN_BOBYQA",xtol_rel=1e-6,maxeval=1e5)
nloptwrap2 <- function(fn,par,lower,upper,control=list(),...) {
    for (n in names(defaultControl)) 
      if (is.null(control[[n]])) control[[n]] <- defaultControl[[n]]
    res <- nloptr(x0=par,eval_f=fn,lb=lower,ub=upper,opts=control,...)
    with(res,list(par=solution,
                  fval=objective,
                  feval=iterations,
                  conv=if (status>0) 0 else status,
                  message=message))
}

system.time(test.model <- glmer(test.formula, data=test.case, 
family='binomial', verbose=TRUE))

system.time(test.model2 <- update(test.model,
control=glmerControl(optimizer="nloptwrap2"))

Also, see this answer for more options and this thread from R-sig-mixed-models (which looks more relevant to your issue).

Edit: I gave you some out-of-date info related to nloptr. In lme4 1.1-7 and up, nloptr is automatically imported (see ?nloptwrap). All you have to do is add

control = [g]lmerControl(optimizer = "nloptwrap") # +g if fitting with glmer

to your call.

Best Answer

Related Solutions

Solved – Why does R’s lm() return different coefficient estimates than the textbook

Solved – Default lme4 optimizer requires lots of iterations for high-dimensional data

Related Question