Can the utilities or choice probabilities be weighted by population weights? Or must the weighting actually occur at the observational/individual level? I assume it must occur at the observational level, but I've seen people weight utilities. Additionally, if the weighting occurs at the observational level, will one still get accurate market share estimates when sub selecting the sample based on different attributes?
Solved – Exogenous weighting: multinomial logit models
econometricslogitmultinomial-distributionsurvey-weightsweighted-data
Related Solutions
Possibility of usage of negative weights depends on the distribution of $W(\boldsymbol{\theta}, \boldsymbol{\alpha})$. For example, let's consider a linear regression model with independent Gaussian noise, so for $i$-th observation we have $$ \theta^0_i = \alpha \theta_i^1 + \varepsilon, \varepsilon \sim \mathcal{N}(0, \sigma^2). $$
We get the log likelihood of the form: $$ l(\boldsymbol{\theta}, \boldsymbol{\alpha}) = -\frac{1}{2 \sigma^2} \sum_{i = 1}^n (\theta^0_i - \alpha \theta_i^1)^2 + c, $$ with $c$ doesn't depend on $\boldsymbol{\alpha}$. This function is quadratic in $\boldsymbol{\alpha}$ and exact analytic solution is available for Maximum Likelihood estimation.
Suppose that we weight our likelihood with $w_i, i = \overline{1, n}$. If some weight is negative, it can be the case that the maximum of weighted likelihood is $+\infty$ (if $\sum_{i = 1}^n w_i (\theta_i^1)^2 < 0$), which doesn't make any sense.
So, allowance of negative weights depends on used statistical model and requires additional attention to ensure that provided solution makes physical sense.
The website does not give much information about what is meant by an "individual weight". However, it seems to have been derived by the application of post-survey techniques. The limited information is rather frustrating. In addition, the publications produced from the data don't provide much information about the way weights have been used.
Therefore, I fear that your best recourse is to write to the survey managers and ask them to provide you with specific technical information about the process used to calculate the weights and their recommended method of applying the weights in analyses. For example, see https://hints.cancer.gov/docs/Instruments/HINTS-FDA_Methodology_Report.pdf
I would be hesitant to treat the "individual weights" as simple frequency weights until I get more information direct from the source.
Your question about how to implement this in R is out-of-scope for these boards. However, you might want to check the survey
package.
Good luck!
EDIT following release of Stata code
As I suspected, the weighting issue is much more complex than simply thinking of INDWT as a frequency weight attached to each response. The Stata code shows that you need to account for stratum, FPC and other effects.
It is now clear that you need to use the survey
package in R to conduct your analysis. Note that the weights
package is limited to simple weighting strategies and simple analyses while survey
is much more flexible and comprehensive. The package has a website to view a ton of learning material.
Good luck to you!
EDIT 2: R code
armenia2 <- svydesign(id=~PSU+ID, strata=~SUBSTRATUM, fpc=~NHHPSU+NADHH, weight=~INDWT, data=armenia)
Best Answer
Population weights reflect the probability of selection of an individual. If you do use weights, they are attached to a person. Think about this: representing a choice model like multinomial logit in a "long" format, with one line per person-by-alternative, and an indicator for the chosen alternative, vs. "wide" format, with one line per person and the dependent variable showing just the chosen alternative, is an implementation issue, not a statistical issue. The latter one clearly has only one weight attached to it, and that is the person's weight.
I am not quite sure I would understand what the (differential) weights of the alternatives mean, at least in the context of finite population sampling.
I wrote far more extensively about analysis of complex survey data here, although that was in the context of health applications. There was a section on subsamples -- you often need to treat them carefully as you may mislead the software into misinterpreting your survey design. There are also some schools of thought that the weights are only needed for descriptive analysis, and are anywhere between useless and irrelevant for complex models... but these schools of thought may be missing a point of what the regressions with survey data are estimating... see also here. Regressions with weights and other elements of complex survey designs intend on estimating the census regression, i.e., what you would have obtained should you have run the regression for the full finite population of interest. Whether that's a relevant target quantity may be a different issue; I think a big part of the argument is that, especially for the counterfactual models, the finite population may not be that well defined. Ah well.