Solved – Logistic Regression with weighted instances

I'm working on implementing a logistic regression algorithm in code. It's based this link. Unfortunately, the paper doesn't talk about weighting the individual examples $x_{i}$.

I think the relevant log likelihood function will look something like this:

$$
L(\vec{w}) = \sum_{i=1}^n \log{g(y_i z_i)} r_i
$$

as opposed to what's in the paper:

$$
L(\vec{w}) = \sum_{i=1}^n \log{g(y_i z_i)}
$$

where $z_i=\sum_k w_k x_{ik}$ and $r_i$ is the instance weight for the given instance $i$. Also, $y_i \in\{-1,1\}$ in this case, and $g$ is the sigmoid function so $1-g(z)=g(-z)$. This is discussed in the link.

Unfortunately, my math skills are not solid enough to be able to solve for the first and second partial derivatives, which are required to perform the optimization. Without the instance weights I'd like to add, the derivatives are:

$$
\frac{\partial{L}}{\partial{w_{k}}} = \sum_{i=1}^{n}y_{i}x_{ik}g(-y_{i}z_{i})
$$
$$
\frac{\partial^{2}{L}}{\partial{w_{j}}\partial{w_{k}}} = -\sum_{i=1}^{n}x_{ij}x_{ik}g(y_{i}z_{i})g(-y_{i}z_{i})
$$

How do these translate with the new $r_{i}$ instance weight involved?

Thanks!

Best Answer

The weights $r_i$ are not a function of $w_i$. So when computing derivatives, you should treat them as a constant.

In particular, the partials w.r.t $w_i$ looks like:

$$ \frac{\partial{L}}{\partial{w_{k}}} = \sum_{i=1}^{n}r_i y_{i}x_{ik}g(-y_{i}z_{i}) $$ $$ \frac{\partial^{2}{L}}{\partial{w_{j}}\partial{w_{k}}} = -\sum_{i=1}^{n}r_i x_{ij}x_{ik}g(y_{i}z_{i})g(-y_{i}z_{i}) $$

If it helps you conceptually, you can think about problems with weighted samples as equivalent to unweighted problems, but where some particular observation $(x,y)$ appears $r$ times rather than only once.

Best Answer

Related Solutions

Solved – From the Perceptron rule to Gradient Descent: How are Perceptrons with a sigmoid activation function different from Logistic Regression

Classification with Noisy Labels – Techniques and Methods

Related Question