Solved – Exogenous weighting: multinomial logit models

econometricslogitmultinomial-distributionsurvey-weightsweighted-data

Can the utilities or choice probabilities be weighted by population weights? Or must the weighting actually occur at the observational/individual level? I assume it must occur at the observational level, but I've seen people weight utilities. Additionally, if the weighting occurs at the observational level, will one still get accurate market share estimates when sub selecting the sample based on different attributes?

Best Answer

Population weights reflect the probability of selection of an individual. If you do use weights, they are attached to a person. Think about this: representing a choice model like multinomial logit in a "long" format, with one line per person-by-alternative, and an indicator for the chosen alternative, vs. "wide" format, with one line per person and the dependent variable showing just the chosen alternative, is an implementation issue, not a statistical issue. The latter one clearly has only one weight attached to it, and that is the person's weight.

I am not quite sure I would understand what the (differential) weights of the alternatives mean, at least in the context of finite population sampling.

I wrote far more extensively about analysis of complex survey data here, although that was in the context of health applications. There was a section on subsamples -- you often need to treat them carefully as you may mislead the software into misinterpreting your survey design. There are also some schools of thought that the weights are only needed for descriptive analysis, and are anywhere between useless and irrelevant for complex models... but these schools of thought may be missing a point of what the regressions with survey data are estimating... see also here. Regressions with weights and other elements of complex survey designs intend on estimating the census regression, i.e., what you would have obtained should you have run the regression for the full finite population of interest. Whether that's a relevant target quantity may be a different issue; I think a big part of the argument is that, especially for the counterfactual models, the finite population may not be that well defined. Ah well.