[Math] Machine learning : Perceptron, purpose of bias and threshold

machine learning

I started to study Machine Learning, but in the book I am reading there is something I don't understand.

I am a total beginner in terms of Machine Learning, and I am just trying to read as much content I can.

I simply don't understand the purpose of bias and threshold.
In the book I am reading, they take the easy example of a bank that could accept or reject credits.

To do so, inputs are given to the perceptron through a training set (vector $x$), weights are generated (vector $w$), and the output is always -1 or +1. I perfectly understand this, and how to update the weights to gain accuracy.

But right after, they introduce this notion of threshold, saying "if the applicant passes the threshold, credit is approved; if not, credit is denied"

As I saw multiples times on internet, they put a bias at $x_0$ with a weight $w_0 = 1$. This bias seems to be -threshold

Can someone explain to me this notion of Threshold / Bias ? I thought the perceptron was all about outputting -1 or 1 for a given input vector (if linearly separable)

Best Answer

In terms of linear separability: using a bias allows the hyperplane that separates the feature space into two regions to not have to go through the origin. Without a bias, any such hyperplane would have to go through the origin, and that may prevent the separability we want.

Simple example: suppose we have two inputs $x$ and $y$ that can take on the values $0$ or $1$, and we want the output to be a $1$ when both inputs are $1$ (an $\land$ logic circuit, basically). This means that the separating hyperplane cannot go through the origin of the feature space (where $x$ and $y$ are $0$). But without using a bias (or, equivalently, by having a threshold of $0$), you can't move the hyperplane (the set of points $(x,y)$ for which $x\cdot w_1 + y \cdot w_2=0$ given some weights $w_1$ and $w_2$) away from the origin, because for any weights, the origin is a solution to this equation.

Best Answer

Related Solutions

[Math] Understanding Regularization parameters in Machine Learning/Statistics

[Math] What do weights do in the perceptron rule

Related Question