I am trying to get r-squared, or explained variation, in a complex survey data using a linear regression (OLS).
In Stata, this can be done by using svy: regress. In R, however, when I use 'survey' package, there is no option for OLS linear regression. There is svyglm, which is generalized linear model (GLM), but this does not provide a value for explained variation (r-squared) because it isn't OLS. Is there a way to get r-squared for complex survey data in R?
library(survey)
design <- svydesign(id = ~psu, strata = ~strata, weight = ~w_mec, nest = TRUE, data = sample)
model1 <- svyglm(design = design, bmi ~ 1 + age + black + hispanics + others + female + edu2 + edu3 + edu4 + near_poor + middle + high, family = gaussian(link = "identity"), data = sample)
summary(model1)
Above is an example of what I did in R. This doesn't give r-squared because it's GLM. You don't really need to reproduce anything; this isn't a code issue, I just want to know if there is a way to get r-squared for complex survey data in R.
Best Answer
For a Gaussian glm (where the population parameter is the OLS parameter) you can just divide the dispersion parameter by the population variance and subtract from 1
Using one of the examples from the
svyglm
help page:You could also get the null-model variance using
svyvar
And in this case we have the whole population, so we can run
lm
on the population and compare the survey estimate of rsquared with the population valueAs an added bonus answer: if you want the Nagelkerke or Cox-Snell r-squared for binary or count data, there's a function
psrsq