Solved – Negative weights in maximum likelihood method

maximum likelihoodweighted-data

In physics we like to use the maximum likelihood method to fit our models to our data.

(I'm sure the first part of this post is review to you all, I just want to be complete so that you will know what I'm talking about in case I misuse any terminology.)

Our data is a set of events. Each event is a measurement of several different quantities $\vec{\theta}$. Our model is a distribution $W(\vec{\theta},\vec{\alpha})$ with unknown parameters $\vec{\alpha}$. $W$ is normalized so that $\int d\vec{\theta} W(\vec{\theta},\vec{\alpha}) = 1$ so that it's a proper pdf. We calculate the likelihood $L(\vec{\alpha}) = \prod_i W(\vec{\theta_i},\vec{\alpha})$, where the product is over our data events. We then maximize the log-likelihood $\ell = \log L$ with respect to $\vec{\alpha}$ to find our best estimates of the parameters.

In certain cases, some events are more important than others and each event comes with a weight $w$. According to my source, Statistics for nuclear and particle physicists by Louis Lyons, to account for the weights, we can modify the log-likelihood like so: $$ \ell(\vec{\alpha}) = \sum_i w_i \log W(\vec{\theta_i},\vec{\alpha}) .$$ (See the link above, where he also mentions another method of implementing this weighting and the advantages of that method.) What I'm wondering is if this procedure is still valid for negative weights. Since Lyons doesn't really provide any justification for this weighting method, it's not really clear to me.

In case anyone asks, I'll try to explain the reason why I want to use negative weights. Our data sample includes both relevant events (events where the physics process of interest takes place) and irrelevant events (events where some other physics process takes place). Ideally what the weight $w$ would be is the probability that a given event is relevant. In this case, of course, $w$ would always be non-negative. But it turns out that this is sometimes difficult to do in a well-motivated way and it works better to let some of the weights be negative. For example, perhaps we know that our relevant events must be distributed in a certain way (this is different from the function $W$ above) and that our irrelevant events follow a different distribution. So we know we must assign our weights in such a way that our weighted data follows the correct distribution, but the weights can't just be arbitrarily assigned to achieve this, they need to make sense within the context of the problem (i.e. similar events should have similar weights). Sometimes it seems like it might be the best solution to let some of the weights go negative. After assigning these weights, the next step in my analysis is doing the maximum likelihood fitting described above, so I was just wondering if that procedure would be compatible with negative weights.

Best Answer

Possibility of usage of negative weights depends on the distribution of $W(\boldsymbol{\theta}, \boldsymbol{\alpha})$. For example, let's consider a linear regression model with independent Gaussian noise, so for $i$-th observation we have $$ \theta^0_i = \alpha \theta_i^1 + \varepsilon, \varepsilon \sim \mathcal{N}(0, \sigma^2). $$

We get the log likelihood of the form: $$ l(\boldsymbol{\theta}, \boldsymbol{\alpha}) = -\frac{1}{2 \sigma^2} \sum_{i = 1}^n (\theta^0_i - \alpha \theta_i^1)^2 + c, $$ with $c$ doesn't depend on $\boldsymbol{\alpha}$. This function is quadratic in $\boldsymbol{\alpha}$ and exact analytic solution is available for Maximum Likelihood estimation.

Suppose that we weight our likelihood with $w_i, i = \overline{1, n}$. If some weight is negative, it can be the case that the maximum of weighted likelihood is $+\infty$ (if $\sum_{i = 1}^n w_i (\theta_i^1)^2 < 0$), which doesn't make any sense.

So, allowance of negative weights depends on used statistical model and requires additional attention to ensure that provided solution makes physical sense.