Solved – Adding weights for highly skewed data sets in logistic regression

I am using a standard version of logistic regression to fit my input variables to binary output variables.

However in my problem, the negative outputs (0s) far outnumber the positive outputs (1s). The ratio is 20:1. So when I train a classifier, it seems that even features that strongly suggest the possibility of a positive output still have very low (highly negative) values for their corresponding parameters. It seems to me that this happens because there are just too many negative examples pulling the parameters in their direction.

So I am wondering if I can add weights (say using 20 instead of 1) for the positive examples. Is this likely to benefit at all? And if so, how should I add the weights (in the equations below).

The cost function looks like the following:
$$J = (-1 / m) \cdot\sum_{i=1}^{m} y\cdot\log(h(x\cdot\theta)) + (1-y)(1 – \log(h(x\cdot\theta)))$$

The gradient of this cost function (wrt $\theta$) is:

$$\mathrm{grad} = ((h(x\cdot\theta) – y)' \cdot X)'$$

Here $m$ = number of test cases, $x$ = feature matrix, $y$ = output vector, $h$=sigmoid function, $\theta$ = parameters we are trying to learn.

Finally I run the gradient descent to find the lowest $J$ possible. The implementation seems to run correctly.

Best Answer

That would no longer be maximum likelihood. Such an extreme distribution of $Y$ only presents problems if you are using a classifier, i.e., if you are computing the proportion classified correctly, an improper scoring rule. The probability estimates from standard maximum likelihood are valid. If the total number of "positives" is smaller than 15 times the number of candidate variables, penalized maximum likelihood estimation may be in order.

Best Answer

Related Solutions

Logistic Regression – Intuition Behind Logistic Regression Explained

Solved – logistic regression gradient of weights

Related Question