[Math] Does choice of initial Theta effect Perceptron’s convergence

machine learning

I am studying the Perceptron algorithm. Right now I am trying to understand if the initialization choice of $\theta$ effects the algorithm's ability to converge. I have tried experimenting by applying the algorithm to a linearly separable training set with different initial $\theta$ values, one being zero and the others non-zero values and the algorithm converged every time. However, I can't come up with a way of mathematically generalizing this pattern. I am trying to figure out how to generalize this pattern; come up with a proof for why the algorithm always converges.

Best Answer

As long as the data set is linearly separable, the perceptron algorithm will always converge in $ \frac{R^2}{\gamma^2} $ iterations. The initialization does not matter.

The proof is a standard thing they explain in any ML course at university (not super trivial to come up with but simple to understand by reading the actual proof). See these notes for example:

http://www.cs.columbia.edu/~mcollins/courses/6998-2012/notes/perc.converge.pdf

I found it by googling "perceptron convergence proof", hope it helps.

In fact the beauty of the prove above is that it shows that if you initialize it to zero then you errors are bounded by $R^2/\gamma^2$. So in a way, it shows that if you start with the vector 0 and update, you eventually converge in a finite number of iterations.

If the initialization affects the speed of convergence, that depends on the data set and in what order you see ur points.