Solved – Correct use of the sample weights in a complex survey design for association analysis (Logit OR)

association-measurersurveysurvey-weightsweighted-sampling

I've doubts about the correct use of sample weights in the NHANES survey, which uses a complex, multistage probability sampling design (1).
I'm aware about the importance of the use of the sample weight when the primary outcome of my study is the estimation of a certain prevalence (disease or condition) (2, 3).
However, I'm not as confident about the approprietess of using sample weights for other purposes, i.e. association analyses (odds ratio by multiple adjusted logit regression).
I read a ton of papers published on top tier journals regarding NHANES data, where the association between two or more conditions has been explored without taking into account the sample weights. For what I understand, when these data are used for cross-sectional association analysis, there is no need to refer to the overall (weighted) population. Indeed, using sample weights would artificially duplicate my observations with potential bias of the association estimate.
Could someone explain whether my conclusion is correct, and when and why use sample weights?
Meanwhile, I found this answer from Dr James H. Watt to the same issue on ResearchGate (4):

The answer to this question depends on an understanding of what a
weighted sample is. Weighted data corrects for nonproportional
sampling of subgroups that have a known probability of occuring in the
population. Weights are computed to equalize the effects of over- or
undersampled subgroups. This might be deliberate oversampling of an
interesting subgroup or the result of sampling error.As long as you
know the proportion of the subgroup in the population, you can correct
the sample estimates by using weighting. As an example, suppose you
are interested in analyzing variables related to a disease that
appears only 1% of the time in the general population. But you would
also like to study the subgroup that has the disease, both alone and
in the context of the whole population. Suppose a power analysis
indicates you need at least 500 members of the general population for
adequate statistical power to estimate relationships in the whole
population, but you need 100 members of this subgroup to have an
acceptable statistical power for an independent analysis of only this
subgroup. To get the needed number of subgroup members with a simple
random sample, you would have to draw a sample of around 10,000 from
the whole population. This is extremely wasteful, as the additional
observations beyond the minimum 500 add diminshing ability to detect
relationships in the general population (at your desired Type II error
rate). Instead you could sample 500 from the general population
(which would include an expected 5 from the target subgroup), then
sample an additional 95 from the sampling frame of the subgroup. Using
these 95 and the 5 from the general population, you have your N=100
for the subgroup analysis. But what if you combine the two samples to
get a representative picture of the whole population? In your combined
sample of 595, the subgroup sample is 16% of the observations. In the
population, that group represents only 1%. So if you just combine the
95 oversampled observations with the 500 general population
observations, you will give each oversampled subgroup observation 16
times the influence that it should have. This extreme bias produces
what is technically termed "the wrong answer". Instead, in the general
population analysis, you would weight each observation by the ratio of
the actual population proportion to the sample proportion. The weight
corrects the influence of each observation from each subsample so that
they represent equivalent observations in the population. There are
lots of online resources that show how to compute the weights. In
this example, the N=500 sample would weight each observation by a
factor of 1.178 and each N=95 subsample observation by .0626, so that
500*1.178 + 95*.0626 = 595, the N of the combined sample. Now the
oversampling bias has been removed in the combined sample. Using
covariates will not remove the bias introduced by nonproportional
sampling, as the bias of nonproportional sampling is just as extreme
in a covariate as in a any other observed variable. If you have sample
weights based on known acccurate population proportions, using them
will ALWAYS reduce your estimation error by removing sample bias.

Best Answer

I think the question you should focus on is the population you want to make inference about. The sample weights are for making estimates of population totals $\hat{Y}=\sum_i w_i y_i$. The weights are for getting from the sample to inference about a specific population. What you should think about is whether that is your population of interest.

So, if you wanted to fit say, a linear regression model you need to have all the population sums, sum of squares, and sum of cross products over the population. Using the survey weights gives you an estimate of these quantities.

You could even say that the "population log likelihood" is a sum as we have $$L_p(\theta) = \sum_i \log\left(f(y_i|\theta)\right)$$

For some log likelihood $\log\left(f(y_i|\theta)\right)$. Using the sample weights essentially provides an estimate of this quantity - taking $y_i=\log\left(f(y_i|\theta)\right)$ in the previous equation for $\hat{Y}$.

However, you are likely to run into problems with standard variance estimation in most modelling programs. The standard errors will be far too small. Conceptually it makes sense, if we consider these as estimates of standard errors we would get from fitting models to census data. That is, we expect $L_P(\theta)$ to be quite sharply peaked. But the problem is that we are using an estimate for $\hat{L}_P(\theta)$ and this estimate has error that needs to be taken account of. Usually there are jackknife/bootstrap weights provided with these kinds of files, and using the variation in these gives you a more reasonable estimate of uncertainty.

This is also a place where your "bayes, model based" / "frequentist, design based" philosophy matters somewhat because your variance estimates depend on what you are conditioning on as fixed. ie is the error from "the sample selected" or from "the predictive model"?

It is also not a bad idea to simply check if using or not using the weights makes a difference to your analysis, noting that you should expect a difference in accuracy measures but not necessarily parameter estimates (such as regression coefficients)