Solved – Using Adaptive Linear Neurons (Adalines) and Perceptrons for 0-1 class problems

linear modelmachine learningneural networksperceptronpython

I am wondering how to adjust the Adaline algorithm to classify the classes 0 and 1 instead of -1 and 1.

I found a section in

Neural Networks and Statistical Learning by Ke-Lin Du, M. N. S. Swamy that confused me a little bit. Here is a link to the relevant paragraph on Google books, and there is a screenshot below:

enter image description here

The original Adaline paper by Widrow can be found here: Adaptive ”Adaline” neuron using chemical ”memistors”

What I find particularly confusing is that it reads like that the scenarios {0, 1} and {+1, -1} can be trained equally.

Similarly, I found the same thing on Wikipedia for the Perceptron algorithm:

enter image description here

Let's start with the class -1 and 1 case. For simplicity, let's say our net input is $\mathbf{z} = \mathbf{w}^T\mathbf{x}$, and the activation function $g(\mathbf{z})$ is the identity function $g(\mathbf{z}) = \mathbf{z}$:

$$\begin{equation}
g({\mathbf{z}}) =\begin{cases}
1 & \text{if $\mathbf{z} > 0$}\\
-1 & \text{otherwise}.
\end{cases}
\end{equation}$$

and

$$\mathbf{z} = w_0x_{0} + w_1x_{1} + \dots + w_mx_{m} = \sum_{j=0}^{m} x_{j}w_{j} \\ = \mathbf{w}^T\mathbf{x}.$$

And the learning rule is

$\Delta w_0 = \eta(\text{target}^{(i)} – \text{output}^{(i)})$
$\Delta w_1 = \eta(\text{target}^{(i)} – \text{output}^{(i)})\;x^{(i)}_{1}$
$\Delta w_2 = \eta(\text{target}^{(i)} – \text{output}^{(i)})\;x^{(i)}_{2}$

Based on my understanding, this is results in a linear function that passes through the origin (because of $w_0$):

enter image description here

And we are "squashing" the output via the unit step:

enter image description here

To make sure that it works, let me implement it in simple Python code:

import numpy as np

class Adaline(object):

    def __init__(self, eta=0.01, epochs=50):
        self.eta = eta
        self.epochs = epochs

    def train(self, X, y):

        self.w_ = np.zeros(1 + X.shape[1])

        for i in range(self.epochs):
            for xi, target in zip(X, y):
                output = self.net_input(xi)
                error = (target - output)
                self.w_[1:] += self.eta * xi.dot(error)
                self.w_[0] += self.eta * error

        return self

    def net_input(self, X):
        return np.dot(X, self.w_[1:]) + self.w_[0]

    def activation(self, X):
        return self.net_input(X)

    def predict(self, X):
        return np.where(self.activation(X) > 0.0, 1, -1)


X = np.array([[1.1, 1.2], [1.4, 1.8], [3.2, 4.2], [5.5, 5.9]])
y = np.array([-1, -1, 1, 1])

ada = Adaline()
ada.train(X, y)
print(ada.predict(X))
print(ada.w_)

and print the results:

[-1 -1  1  1]
[-0.59518362  0.08374251  0.19489769]

However, this doesn't work if I'd just change it to

y = np.array([0, 0, 1, 1])

and

def predict(self, X):
    return np.where(self.activation(X) > 0.0, 1, 0)

Which is due to the unit step looking as follows now in in the 0-1 class scenario:

enter image description here

And $g(\mathbf{z})$ becomes $g(\mathbf{z}) = \mathbf{z} – w_0$

So that

def net_input(self, X):
    return np.dot(X, self.w_[1:]) + self.w_[0]

def activation(self, X):
    return self.net_input(X)

def predict(self, X):
    return np.where(self.activation(X) - self.w_[0] >= 0.0, 1, 0)

Does this make any sense?

Best Answer

Your code is correct and problem is in different way.

Your weight update is proportional to your input and weights for class 1 updates much faster than for class 0 (because you inputs for class 1 contains bigger numbers than for class 0). For first epoch first two 0 classes get you zero updates because they are perfectly correct (your weight zero by default and first outputs are zero). The third output for first 1 class give you bad result and you update your weights, and the same problem with next 1 class input. So your epoch is end and you will get updated weight which are more than zero and after product with your weights give you result more than $0$. As I say before weights for class 1 update much faster. In epoch 2 0 classes try minimize previous epoch update after class 1 but there update 'power' is not enough to get result less than zero. In next epoch you will get the same picture and class 1 makes greater contribution in weight updates than zero classes.

I can se few solutions for you:

1) Your weights is also zero and you always use a bad start point for your training. Solution - you can use random standard distribution weights for your learning

2) You can setup your activation function where step bound will be 0.5 (the half way between 0 and 1). But I'am not realy sure that your learning proccess will be stable for very big epoch size

Best Answer

Related Solutions

Solved – From the Perceptron rule to Gradient Descent: How are Perceptrons with a sigmoid activation function different from Logistic Regression

Solved – Is `sigmoid` required for binary cross entropy

Related Question