Solved – Question regarding weight update rule in Perceptron

machine learningperceptronweights

In the single-layer perceptron, the weight update rule is given by:-

$w_j := w_j + (y^{(i)} – \hat{y^{(i)}}) \times x_j^{(i)}$,

where $w_j$ is the weight of the $j$th feature, $y^{(i)}$ is the actual output for the $i$th training example, $\hat{y^{(i)}}$ is the predicted output for the $i$th training example, and $x_j^{(i)}$ is the $j$th feature of the $i$th training example.

Say I'm using this algorithm for a binary classification problem, where $y^{(i)} \in \{1,-1\}$

Now, as I understand it, this rule ensures that the weight update is proportional to the value of $x_j^{(i)}$. For example, if we have a sample with $x_j^{(i)} = 2$ incorrectly classified as $-1$, then by this rule,

$\Delta w_j = (1^{(i)} – -1^{(i)})2^{(i)} = 4$, which increases $w_j$, thereby pushing the decision boundary to classify this sample correctly the next time.

However, doesn't the direction of this "push" depend on the sign of $x_j^{(i)}$? I mean, if $x_j^{(i)} >0$, this works as expected. But if it is negative, isn't change in weight negative (in the above example), thereby giving the opposite effect?

Best Answer

If $x_j^{(i)} = -2$ and your result is incorrectly classified as $-1$, then your weight should be negative. Remember that a perceptron looks like this:

$f(x) = b + wx$

Where $x$ is the set of inputs. If an $x$ is negative (say, $2$, as in your example), then in order to produce a positive output value (as is desired in your example), the weight should be negative. Hence decreasing it will push it closer to the desired value.

Best Answer

Related Solutions

Solved – Clarification about Perceptron Rule vs. Gradient Descent vs. Stochastic Gradient Descent implementation

Solved – From the Perceptron rule to Gradient Descent: How are Perceptrons with a sigmoid activation function different from Logistic Regression

Related Question