Alternative initiating values for $\theta^{(0)}$ such that perceptron producs a different linear classifier

The following dataset is linearly separable with $\theta=[-0.5,-2]$ given the initiating $\theta^{(0)}=[0,0]$ for a perceptron algorithm:
$$\begin{array}{c|lcr}
i & \text{Data point $x^{(i)}$} & \text{Labels $y^{(i)}$}\\
\hline
1 & [1,-1] & 1 \\
2 & [0,1] & -1 \\
3 & [-1.5,-1] & 1
\end{array}
$$

It is then mentioned that if the initiating $\theta^{(0)}=[1000,-1000]$, corrections to $\theta$ have a smaller impact, thus requiring more corrections before we have a linear classifier.

I am then asked to provide a value of initiating $\theta^{(0)}$ so that the algorithm will return a different classifier than using $\theta^{(0)}=[0,0]$.

I couldn't figure it out so I tried $\theta^{(0)}=[-1000,-1000]$, strangely it worked (although the solution provided is $\theta^{(0)}=[1,-2]$).

Why would $\theta^{(0)}=[-1000,-1000]$ or $\theta^{(0)}=[1,-2]$ yield a different result? I understand that there are always infinitely many linear classifiers possible (given it is linearly separable), since multiples of $\theta=[-0.5,-2]$ would also work, e.g. $\theta=[-1,-4]$.

What I don't get is why would having different $\theta^{(0)}$ make a difference, specifically these two?

Best Answer

We have $\{(0,1)\}$ in the negative class and $\{(1,-1), (-1.5, -1)\}$ is the positive class.

An easy way to set the initial $\theta^{(0)}$ such that it ends at a different classifier is to intentionally avoiding $(-0,5, -2)$ and yet pick it such that it can separate the two classes. If we start with a vector that can separate the two classes then we are done.

In fact, you can pick any $\theta_1, \theta_2$ satisfying $$(0,1)\cdot (\theta_1, \theta_2) < 0$$

$$(1,-1)\cdot (\theta_1, \theta_2) > 0$$ $$(-1.5,-1)\cdot (\theta_1, \theta_2) > 0$$

Notice that we can pick it to be $(1,-2)$.

Note, if you pick a different starting point, it is very unlikely that you end up with the same point.

Best Answer

Related Solutions

[Math] For linear regression: compute $\Theta T X$

[Math] Does choice of initial Theta effect Perceptron’s convergence

Related Question