Solved – Using lmrob for repeated measures; warning message: S-estimated scale == 0: Probably exact fit; check your data

regressionrepeated measuresrobust

I'm trying to run the lmrob() function in the R package robustbase, using data from a mixed factorial design study. My dependent variable is repeated (pre+ post) and I have two independent variables. I'm entering the post-measurement as the dependent variable, and controlling for the pre-measurement.

The function I'm trying to run is basically:

lmrob(Post ~ pre +IV1 + IV2, setting = "KS2014", data=mydataframe)

When I run this, I get an error message saying:

S-estimated scale == 0:  Probably exact fit; check your data.

When I don't control for the pre-measurement, I don't get this warning message. So I assume it has something to do with the repeated measures being correlated? Does anyone have any idea what might be causing the issue and how to resolve it?

This is the first time I'm posting here so please bear with me if I've not included enough detail or my question isn't clear enough, I will provide clarification if needed! I've copied in the output from the console, the session info, and the relevant extract from the dataset below.

Call:
lmrob(formula = Post_Intention ~ TPB_vs_no_TPB + Tailored_vs_Untailored + 
    Pre_Intention, data = dfa, setting = "KS2014")
 \--> method = "S"
Residuals:
   Min     1Q Median     3Q    Max 
   -80      0      0     10     90 

Exact fit detected

Coefficients:
                                 Estimate Std. Error t value Pr(>|t|)
(Intercept)                             0          0      NA       NA
TPB_vs_no_TPBNo TPB                     0          0      NA       NA
Tailored_vs_UntailoredUntailored        0          0      NA       NA
Pre_Intention                           1          0      NA       NA

Robustness weights: 
 88 observations c(1,5,10,13,16,19,21,25,27,28,29,31,32,34,36,40,42,50,53,56,57,61,66,68,75,79,83,86,87,93,98,104,108,112,116,119,124,125,128,134,135,136,138,139,144,145,148,153,156,161,164,165,171,173,175,176,177,179,182,184,190,193,197,198,199,203,213,215,216,217,218,219,220,226,232,233,235,236,239,241,242,243,245,246,247,250,251,252)
     are outliers with |weight| = 0 ( < 0.0004); 
 165 weights are ~= 1.
Algorithmic parameters: 
      tuning.chi1       tuning.chi2       tuning.chi3       tuning.chi4                bb       tuning.psi1       tuning.psi2 
       -5.000e-01         1.500e+00                NA         5.000e-01         5.000e-01        -5.000e-01         1.500e+00 
      tuning.psi3       tuning.psi4        refine.tol           rel.tol         solve.tol       eps.outlier             eps.x 
        9.500e-01                NA         1.000e-07         1.000e-07         1.000e-07         3.953e-04         1.819e-10 
warn.limit.reject warn.limit.meanrw 
        5.000e-01         5.000e-01 
     nResample         max.it       best.r.s       k.fast.s          k.max    maxit.scale      trace.lev            mts 
          1000            500             20              2           2000            200              0           1000 
    compute.rd fast.s.large.n 
             0           2000 
              setting                   psi           subsampling                   cov compute.outlier.stats 
             "KS2014"                 "lqq"         "nonsingular"             ".vcov.w"                "SMDM" 
seed : int(0) 
> source('~/R scripts/lmrob.R')
re-encoding from UTF-8
Warning message:
In lmrob.S(x, y, control = control, mf = mf) :
  S-estimated scale == 0:  Probably exact fit; check your data
> sessionInfo
function (package = NULL) 
{
    z <- list()
    z$R.version <- R.Version()
    z$platform <- z$R.version$platform
    if (nzchar(.Platform$r_arch)) 
        z$platform <- paste(z$platform, .Platform$r_arch, sep = "/")
    z$platform <- paste0(z$platform, " (", 8 * .Machine$sizeof.pointer, 
        "-bit)")
    z$locale <- Sys.getlocale()
    if (.Platform$OS.type == "windows") {
        z$running <- win.version()
    }
    else if (nzchar(Sys.which("uname"))) {
        uname <- system("uname -a", intern = TRUE)
        os <- sub(" .*", "", uname)
        z$running <- switch(os, Linux = if (file.exists("/etc/os-release")) {
            tmp <- readLines("/etc/os-release")
            t2 <- if (any(startsWith(tmp, "PRETTY_NAME="))) sub("^PRETTY_NAME=", 
                "", grep("^PRETTY_NAME=", tmp, value = TRUE)[1L]) else if (any(startsWith(tmp, 
                "NAME"))) sub("^NAME=", "", grep("^NAME=", tmp, 
                value = TRUE)[1L]) else "Linux (unknown distro)"
            sub("\\"(.*)\\"", "\\\\1", t2)
        } else if (file.exists("/etc/system-release")) {
            readLines("/etc/system-release")
        }, Darwin = {
            ver <- readLines("/System/Library/CoreServices/SystemVersion.plist")
            ind <- grep("ProductUserVisibleVersion", ver)
            ver <- ver[ind + 1L]
            ver <- sub(".*<string>", "", ver)
            ver <- sub("</string>$", "", ver)
            ver1 <- strsplit(ver, ".", fixed = TRUE)[[1L]][2L]
            sprintf("%s %s %s", ifelse(as.numeric(ver1) < 12, 
                "OS X", "macOS"), switch(ver1, `6` = "Snow Leopard", 
                `7` = "Lion", `8` = "Mountain Lion", `9` = "Mavericks", 
                `10` = "Yosemite", `11` = "El Capitan", `12` = "Sierra", 
                `13` = "High Sierra", ""), ver)
        }, SunOS = {
            ver <- system("uname -r", intern = TRUE)
            paste("Solaris", strsplit(ver, ".", fixed = TRUE)[[1L]][2L])
        }, uname)
    }
    if (is.null(package)) {
        package <- grep("^package:", search(), value = TRUE)
        keep <- vapply(package, function(x) x == "package:base" || 
            !is.null(attr(as.environment(x), "path")), NA)
        package <- .rmpkg(package[keep])
    }
    pkgDesc <- lapply(package, packageDescription, encoding = NA)
    if (length(package) == 0) 
        stop("no valid packages were specified")
    basePkgs <- sapply(pkgDesc, function(x) !is.null(x$Priority) && 
        x$Priority == "base")
    z$basePkgs <- package[basePkgs]
    if (any(!basePkgs)) {
        z$otherPkgs <- pkgDesc[!basePkgs]
        names(z$otherPkgs) <- package[!basePkgs]
    }
    loadedOnly <- loadedNamespaces()
    loadedOnly <- loadedOnly[!(loadedOnly %in% package)]
    if (length(loadedOnly)) {
        names(loadedOnly) <- loadedOnly
        pkgDesc <- c(pkgDesc, lapply(loadedOnly, packageDescription))
        z$loadedOnly <- pkgDesc[loadedOnly]
    }
    z$matprod <- as.character(options("matprod"))
    es <- extSoftVersion()
    z$BLAS <- as.character(es["BLAS"])
    z$LAPACK <- La_library()
    class(z) <- "sessionInfo"
    z
}
<bytecode: 0x0000000017538350>
<environment: namespace:utils>

Dataset

dput(dfa)
structure(list(Pre_Intention = c(50, 10, 50, 100, 80, 100, 10,
0, 100, 50, 90, 100, 50, 100, 100, 50, 100, 100, 30, 0, 10, 0,
60, 10, 0, 0, 0, 0, 40, 10, 50, 0, 70, 50, 50, 20, 0, 50, 0,
70, 50, 60, 0, 100, 50, 100, 100, 100, 0, 50, 0, 10, 0, 60, 0,
50, 50, 20, 100, 100, 50, 100, 100, 0, 60, 20, 100, 10, 50, 100,
100, 100, 0, 50, 10, 0, 80, 50, 30, 100, 100, 80, 40, 100, 0,
10, 20, 100, 100, 100, 100, 0, 50, 100, 70, 100, 40, 20, 100,
100, 100, 100, 0, 50, 100, 100, 0, 50, 50, 90, 20, 50, 10, 40,
50, 20, 50, 0, 0, 100, 100, 10, 10, 80, 10, 100, 80, 50, 100,
10, 100, 100, 90, 50, 50, 50, 100, 50, 40, 100, 100, 40, 20,
10, 10, 20, 100, 20, 100, 0, 100, 0, 30, 0, 10, 10, 40, 0, 40,
0, 50, 0, 0, 30, 0, 0, 50, 0, 0, 0, 0, 10, 0, 0, 10, 10, 20,
50, 20, 100, 50, 20, 90, 20, 100, 70, 20, 90, 10, 20, 80, 100,
90, 100, 100, 100, 10, 10, 20, 0, 60, 0, 10, 100, 100, 100, 10,
10, 0, 20, 0, 0, 40, 0, 30, 0, 40, 70, 0, 0, 30, 40, 0, 10, 20,
20, 50, 10, 10, 0, 0, 10, 0, 50, 0, 40, 0, 20, 0, 0, 20, 100,
0, 100, 30, 0, 50, 70, 10, 0, 0, 30, 10), Post_Intention = c(70,
10, 50, 100, 0, 100, 10, 0, 100, 90, 90, 100, 100, 100, 100,
80, 100, 100, 70, 0, 30, 0, 60, 10, 10, 0, 40, 20, 70, 10, 80,
40, 70, 40, 50, 30, 0, 50, 0, 100, 50, 100, 0, 100, 50, 100,
100, 100, 0, 60, 0, 10, 40, 60, 0, 100, 80, 20, 100, 100, 90,
100, 100, 0, 60, 30, 100, 0, 50, 100, 100, 100, 0, 50, 20, 0,
80, 50, 100, 100, 100, 80, 50, 100, 0, 0, 30, 100, 100, 100,
100, 0, 60, 100, 70, 100, 40, 30, 100, 100, 100, 100, 0, 100,
100, 100, 0, 100, 50, 90, 20, 70, 10, 40, 50, 50, 50, 0, 10,
100, 100, 10, 10, 20, 40, 100, 80, 80, 100, 10, 100, 100, 90,
80, 70, 70, 100, 80, 50, 100, 100, 40, 20, 100, 50, 20, 100,
30, 100, 0, 100, 0, 60, 0, 10, 20, 40, 0, 40, 0, 60, 0, 0, 10,
10, 0, 50, 0, 0, 0, 30, 10, 30, 0, 0, 30, 10, 50, 80, 100, 50,
50, 90, 0, 100, 70, 20, 90, 10, 30, 80, 100, 100, 100, 100, 100,
20, 20, 40, 0, 60, 0, 90, 100, 100, 100, 10, 10, 0, 20, 0, 0,
60, 0, 50, 20, 70, 80, 20, 20, 30, 40, 0, 10, 20, 40, 50, 10,
10, 0, 0, 20, 60, 50, 10, 60, 0, 20, 50, 0, 60, 70, 10, 100,
40, 30, 100, 70, 10, 30, 20, 40, 10), TPB_vs_no_TPB = structure(c(1L,
2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L,
2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L,
2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L,
2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L,
2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L,
1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L,
2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L,
2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L,
2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L,
1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L,
2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("TPB",
"No TPB"), class = "factor"), Tailored_vs_Untailored = structure(c(2L,
2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L,
1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 2L,
1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L,
1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L,
1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L,
1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L,
1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L,
1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L,
1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L,
2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L,
1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L), .Label = c("Tailored",
"Untailored"), class = "factor")), .Names = c("Pre_Intention",
"Post_Intention", "TPB_vs_no_TPB", "Tailored_vs_Untailored"), class = "data.frame", row.names = c(NA,
-253L))

Best Answer

The initial S estimate is based on random sampling. It has itself quite a few tuning parameters, see help(lmrob.control) and probably ?lmrob.S . As @mdewey was also thinking, it could be that too many subsamples (of the random sample) gave perfect fits. I (as maintainer of robustbase) would be happy to investigate in more detail (when back at work), but for that I (or others helping) need to be able to get your data (or subset of your data which gives the same error). If it's small enough, you could use to dput(<dataframe>) and paste the result here.

Even before that you could try to use the currently most recommended setting="KS2014" and see if it helps (it already tries to use "better" S estimate tuning). Last but not least: Are you using the most current version of robustbase?

Related Solutions

Solved – How to fit a nonlinear mixed effects model for repeated measures data using nlmer()

You can use normal likelihood ratio tests. Here’s a simple example. First, let’s create observations from 10 individuals based on your parameters:

Asym = .6
xmid = 23
scal = 5

n = 10
time = seq(1,60,5)

d = data.frame(time=rep(time,10),
               Asym, xmid, scal, group=0)
d$subj = factor(rep(1:n, each=length(time)))

Now let half of them have different asymptotes and midpoint parameters:

ind = (nrow(d)/2):nrow(d)
d$Asym[ind] = d$Asym[ind] + .1
d$xmid[ind] = d$xmid[ind] + 10
d$group[ind] = 1
d$group=factor(d$group)

We can simulate response values for all the individuals, based on the model:

set.seed(1)
d = transform(d, y = Asym/(1+exp((xmid-time)/scal)) +
                     rnorm(nrow(d), sd=.04))
library(lattice)
xyplot(y~time | group, group=subj,
       data=d, type=c("g","l"), col="black")

Spaghetti plots of the data

We can see clear differences between the two groups, differences that the models should be able to pick up. Now let’s first try to fit a simple model, ignoring groups:

> fm1 = nls(y ~ SSlogis(time, Asym, xmid, scal), data=d)
> coef(fm1)
      Asym       xmid       scal 
 0.6633042 28.5219166  5.8286082

Perhaps as expected, the estimates for Asym and xmid are somewhere between the real parameter values for the two groups. (That this would be the case isn’t obvious, though, since the scale parameter is also changed, to adjust for the model misspecification.) Now let’s fit a full model, with different parameters for the two groups:

> fm2 = nls(y ~ SSlogis(time, Asym[group], xmid[group], scal[group]),
          data=d,
          start=list(Asym=rep(.6,2), xmid=rep(23,2), scal=rep(5,2)))
> coef(fm2)
    Asym1     Asym2     xmid1     xmid2     scal1     scal2 
 0.602768  0.714199 22.769315 33.331976  4.629332  4.749555

Since the two models are nested, we can do a likelihood ratio test:

> anova(fm1, fm2)
Analysis of Variance Table

Model 1: y ~ SSlogis(time, Asym, xmid, scal)
Model 2: y ~ SSlogis(time, Asym[group], xmid[group], scal[group])
  Res.Df Res.Sum Sq Df  Sum Sq F value    Pr(>F)    
1    117    0.70968                                 
2    114    0.13934  3 0.57034  155.54 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The extremely small p-value clearly shows that the simple model was too simple; the two groups do differ in their parameters.

However, the two scale parameters estimates are almost identical, with a difference of just .1. Perhaps we need only need one scale parameter? (Of course we know the answer is yes, since we have simulated data.)

(The difference between the two asymptote parameters is also just .1, but that’s a large difference when we take the standard errors into account – see summary(fm2).)

So we fit a new model, with a common scale parameter for the two groups, but different Asym and xmid parameters, as before:

> fm3 = nls(y ~ SSlogis(time, Asym[group], xmid[group], scal),
          data=d,
          start=list(Asym=rep(.6,2), xmid=rep(23,2), scal=5))
> coef(fm3)
     Asym1      Asym2      xmid1      xmid2       scal 
 0.6035251  0.7129002 22.7821155 33.3080264  4.6928316

And since the reduced model is nested in the full model, we can again do a likelihood ratio test:

> anova(fm3, fm2)
Analysis of Variance Table

Model 1: y ~ SSlogis(time, Asym[group], xmid[group], scal)
Model 2: y ~ SSlogis(time, Asym[group], xmid[group], scal[group])
  Res.Df Res.Sum Sq Df     Sum Sq F value Pr(>F)
1    115    0.13945                             
2    114    0.13934  1 0.00010637   0.087 0.7685

The large p-value indicates that the reduced model fits as well as the full model, as expected.

We can of course do similar tests to check if different parameter values are needed for just Asym, just xmid or both. That said, I would not recommend doing stepwise regression like this to eliminate parameters. Instead, just test the full model (fm2) against the simple model (fm1), and be happy with the results. To quantify any differences, plots will be helpful.

ANOVA – How to Collapse Data for Repeated Measures ANOVA

Yes you threw away information but you had to because that's how you run ANOVA. Your multiple measures of a condition within a subject only serve to better estimate the mean of that condition for the subject under ANOVA.

You have, from your description, at least a 2 between x 3 within ANOVA but I'm unclear on a point. Are you're six scores "secondwise" 6 additional conditions? What are they? Regardless, you need to add the between factor. It seems apparent from your graph you have a between x within interaction of some sort.

(and why the heck are you running the analyses in SPSS and graphing them in R?)

Dataset

Best Answer

Related Solutions

Solved – How to fit a nonlinear mixed effects model for repeated measures data using nlmer()

ANOVA – How to Collapse Data for Repeated Measures ANOVA

Related Question