Solved – Robust regression in R

least squaresrregressionrobust

I've used an ordinary least square linear regression model in R that looks something like this:

ols <- lm(DV ~ IV1 + IV2)

When I type this:

summary(ols)

I get a table showing Estimate, Std Error, t value and P(>|t|) for each coefficient. I also get the residual standard error, multiple r-square, adjusted r-square, f-statistic, and p-value for the model.

And I've used a robust linear regression model that looks something like this:

roblm <- rlm(DV ~ IV1 + IV2)

When I type this:

summary(roblm)

I get a table showing Value, Std Error, and t value for each coefficient. But I don't get a p-value for each coefficient. Similarly, I get the residual standard error for the model, but I don't get multiple r-square, adjusted r-square, f-statistic, and p-value for the model.

Why aren't these additional statistics provided for the robust linear regression model? Do they not make sense in the context of this model? If these statistics do make sense, how would I go about getting them?

Best Answer

To expand on the advice of @kjetilbhalvorsen, here is an example of robust regression with the robustbase package. Note that the summary includes p-values for the effects and an r-squared value.

Source and load packages

### Adapted from: http://rcompanion.org/handbook/I_11.html

if(!require(robustbase)){install.packages("robustbase")}
if(!require(car)){install.packages("car")}
if(!require(multcomp)){install.packages("multcomp")}

Toy data

DV=c(1,4,3,6,5,8,1,4,3,6,5,12,1,2,3,4,5,6,1,2,3,4,5,6)
IV1 = factor(c(rep("A",6), rep("B",6), rep("C",6), rep("D",6)))
IV2 = factor(rep(c("M","N"),12))

Fit robust model and view summary

library(robustbase)

model = lmrob(DV ~ IV1 + IV2)

summary(model)

A p-value for the effects can be determined using the anova.lmrob function.

### Effect of IV1

model.2 = lmrob(DV ~ IV2)

anova(model, model.2)

### Effect of IV2

model.3 = lmrob(DV ~ IV1)

anova(model, model.3)

The documentation for car:Anova doesn't mention lmrob objects, but at least for this example, it seems to match the application of the anova.lmrob function.

library(car)

Anova(model)

Likewise, the documentation for the multcomp package doesn't mention lmrob objects, but at least for this example, the results seem reasonable.

library(multcomp)

mc = glht(model,
          mcp(IV2 = "Tukey"))

mcs = summary(mc, test=adjusted("single-step"))

mcs


mc = glht(model,
          mcp(IV1 = "Tukey"))

mcs = summary(mc, test=adjusted("single-step"))

mcs

Related Solutions

Solved – least absolute deviation regression’s coefficient significance levels

There's not actually a t-test, because the estimate divided by its standard error doesn't have a t-distribution. Similar for an F-test.

Being a maximum likelihood estimate, there would be an asymptotic z-test, or an asymptotic chi-square test.

[There's the possibility of using some resampling-based tests as well, permutation tests or bootstrapping. You could also use L1pack's ability to simulate from L1 models to do a parametric bootstrap.]

You could compute something like an $R^2$ but it doesn't quite make sense because you're not computing something that tries to maximize that; it might make more sense to compute some analogous statistic, but $R^2$ has a number of properties and it depends on which things you try to carry over and which you don't.

[I note that quantreg::rq (which by default does L1 regression) will give an interval for the coefficients; this allows for a test (since you can see if the interval includes 0). There are also some other testing options in that package]

Solved – Different regression coefficients in R and Excel

The difference between coefficients is in the relation x versus y which is reversed in the one case.

Note that

in your R case the coefficient relates to 'suva'
and in your Excel case the coefficient relates to 'heather'.

see in the following code where R can get to both cases:

lm(suva ~ heather, data = as.data.frame(data))

Call:
lm(formula = suva ~ heather, data = as.data.frame(data))

Coefficients:
(Intercept)      heather  
      14.65       -13.60  

> lm(heather ~suva, data = as.data.frame(data))

Call:
lm(formula = heather ~ suva, data = as.data.frame(data))

Coefficients:
(Intercept)         suva  
    0.32524     -0.01276

rest of the code:

data <- c(
12.880545,   0.061156645, 0.15   , 0.525,   0,
7.098873327, 0.026878039, 0.2275,  0   ,0,
8.660688381, 0.04037841 , 0.425 ,  0.25 ,   0,
7.734546932, 0.021618446, 0.225 , 0.3875,  0,
16.70696048, 0.103626684, 0.15  ,  0.075,   0,
9.763315183, 0.013387158, 0.25  ,  0.075,   0,
12.91735434, 0.008076468, 0.22  ,  0.22 ,   0,
19.94153851, 0.150798057, 0.0375,  0.35 ,   0.225,
17.25115559, 0.052229596, 0.0625,  0.2625,  0.225,
15.38596941, 0.05429447 , 0.1125,  0.45 ,   0.025,
15.53714185, 0.05933884 , 0.1625,  0.525,   0.0625,
14.11551229, 0.064579437, 0.1875,  0.35 ,   0.1375,
14.88575569, 0.0189853  , 0.3375,  0.3, 0,
12.32229733, 0.043085602, 0.0875,  0.1375,  0,
17.23861185, 0.071705699, 0.15  ,  0.1375,  0,
11.50832463,     0.1125 , 0.0875,  0.075, 0,
14.4810484,  0.078476821, 0.0375,  0.125,   0.0625,
9.110262652, 0.077306938, 0.145 ,  0.35 ,   0.0125,
10.8571733,  0.02681341 , 0.0375,  0.525,   0,
9.589339421, 0.01892435 , 0.2275,  0  , 0,
7.260373588, 0.014538237, 0.425 ,  0.25 ,   0,
11.11099161, 0.022802578, 0.225 ,  0.3875 , 0,
10.81488848, 0.047587818, 0.15  ,  0.075  , 0,
8.224131957, 0.031126904, 0.25  ,  0.075  , 0,
8.818607863, 0.002855409, 0.22  ,  0.22   , 0,
11.53999863, 0.031465613, 0.0375,  0.35   , 0.225,
14.92784964, 0.069998663, 0.0625,  0.2625 , 0.225,
9.666480932, 0.02387741 , 0.1125,  0.45   , 0.025,
12.51000758, 0.016960259, 0.1625,  0.525  , 0.0625,
13.32611463, 0.033670382, 0.1875,  0.35   , 0.1375,
16.76535191, 0.029613698, 0.3375,  0.3 ,0,
11.24615281, 0.008440059, 0.0875,  0.1375,  0,
10.60564875, 0.003930792, 0.15  ,  0.1375,  0,
11.82909125, 0.036017582, 0.1125,  0.0875 , 0.075,
18.2337185,  0.143451512, 0.0375,  0.125  , 0.0625,
10.6226222,  0.020561242, 0.145 ,  0.35   , 0.0125
)
data <- matrix(data,36, byrow=1)
colnames(data) <- c("suva", "Std dev", "heather", "sedge",   "sphagnum")

Why then, is $R^2$ still the same?

There is a certain symmetry in the situation. The regression slope coefficient is (in simple linear regression) the correlation coefficient scaled by the variance of the $x$ and $y$ data.

$$\hat\beta_{y \sim x} = r_{xy} \frac{s_y}{s_x}$$

The regression model variance is then:

$$s_{mod} = \hat\beta_{y \sim x} s_x = r_{xy} s_y$$

and the ratio of model variance and variance of the data is:

$$R^2 = \left( \frac{s_{mod}}{s_y} \right)^2= r_{xy}^2$$

Best Answer

Related Solutions

Solved – least absolute deviation regression’s coefficient significance levels

Solved – Different regression coefficients in R and Excel

Related Question