Solved – R survey package: finite population correction affects point estimate in addition to the variance estimate

finite-populationrsurvey

Forgive me if this is an idiot question, but I believed that including a finite population correction parameter in the R survey package should only impact my variance estimates in a stratified sample (simple random sampling within stratum). Yet the addition of fpc seems to slightly modify my point estimate as well…perhaps just an unimportant artifact of calculation method?

I realize this example is very short on details, but I first wanted to confirm that my intuition that the change that I've observed is suspicious…perhaps the warning message is applicable to the difference in mean Height observed (187.751 changes to 186.83)?

dstrat=svydesign(ids=~1,strata = ~stratum, fpc=~pop, data=demo)
svymean(~Height+Weight,dstrat,na.rm=T)
mean SE
Height 65.614 0.3091
Weight 187.751 2.4551

dstrat=svydesign(ids=~1,strata = ~stratum, data=demo)
Warning message:
In svydesign.default(ids = ~1, strata = ~stratum, data = demo) :
No weights or probabilities supplied, assuming equal probability
svymean(~Height+Weight,dstrat,na.rm=T)
mean SE
Height 65.62 0.3019
Weight 186.83 2.6220

Final note: I also used bootstrap estimates for SE (perhaps incorrectly), and arrived at the same point estimates as those above withOUT fpc:

demodrep=svrepdesign(data=demo, type="bootstrap", repweights=W,scale=bootresults$scale,rscale=bootresults$rscales,combined.weights=TRUE)

svymean(~Height+Weight,demodrep,na.rm=T)
mean SE
Height 65.62 0.4617
Weight 186.83 3.2325

R 3.2.0
Survey package version 3.29-5

Thank you in advance for any insight/pointers

Best Answer

yes, they will give different estimates. ?svydesign says "If population sizes are specified but not sampling probabilities or weights, the sampling probabilities will be computed from the population sizes assuming simple random sampling within strata."

looking inside survey:::svydesign.default

if (is.null(probs) && is.null(weights)) {
    if (is.null(fpc$popsize)) {
        if (missing(probs) && missing(weights)) 
            warning("No weights or probabilities supplied, assuming equal probability")
        probs <- rep(1, nrow(ids))
    }
    else {
        probs <- 1/weights(fpc, final = FALSE)
    }
}

so if weights are not specified by the user but the fpc is, then the stratified fpc gets used in the computation for the weights (which will affect point estimates as well as variance calculations)

library(survey)
data(api)

dstrat1<-svydesign(id=~1,strata=~stype, data=apistrat, fpc=~fpc)
dstrat2<-svydesign(id=~1,strata=~stype, data=apistrat)

svymean( ~ api00 , dstrat1 )
svymean( ~ api00 , dstrat2 )