I am trying to plot the decision boundary of a perceptron algorithm and I am really confused about a few things. My input instances are in the form $[(x_{1},x_{2}), y]$, basically a 2D input instance ($x_{1}$ and $x_{2}$) and a binary class target value ($y$) [1 or 0].

My weight vector hence is in the form: $[w_{1}, w_{2}]$.

Now I have to incorporate an additional bias parameter $w_{0}$ and hence my weight vector becomes a $3 \times 1$ vector? is it $1 \times 3$ vector? I think it should be $1 \times 3$ since a vector has only 1 row and n columns.

Now let's say I instantiate $[w_{0}, w_{1}, w_{2}]$ to random values, how would I plot the decision boundary for this? Meaning what does $w_{0}$ signify here? Is $w_{0}/norm(w)$ the distance of the decision region from the origin? If so how do I capture this and plot it in Python using matplotlib.pyplot or its Matlab equivalent?

I would really appreciate even a little help regarding this matter.

## Best Answer

The way the perceptron predicts the output in each iteration is by following the equation:

$$y_{j} = f[{\bf{w}}^{T} {\bf{x}}] = f[\vec{w}\cdot \vec{x}] = f[w_{0} + w_{1}x_{1} + w_{2}x_{2} + ... + w_{n}x_{n}]$$

As you said, your weight $\vec{w}$ contains a bias term $w_{0}$. Therefore, you need to include a $1$ in the input to preserve the dimensions in the dot product.

You usually start with a column vector for the weights, that is, a $n \times 1$ vector. By definition, the dot product requires you to transpose this vector to get a $1 \times n$ weight vector and to complement that dot product you need a $n \times 1$ input vector. That's why a emphasized the change between matrix notation and vector notation in the equation above, so you can see how the notation suggests you the right dimensions.

Remember, this is done for each input you have in the training set. After this, update the weight vector to correct the error between the predicted output and the real output.

As for the decision boundary, here is a modification of the scikit learn code I found here:

which produces the following plot:

Basically, the idea is to predict a value for each point in a mesh that covers every point, and plot each prediction with an appropriate color using

`contourf`

.