Solved – Visualizing High Dimensional weight space for perceptrons

data visualizationhigh-dimensionalmanifold-learningneural networksperceptron

I am watching the Neural Network videos by Prof. Geoff Hinton. In there he talks about a high dimensional Weight Space for perceptrons.

In particular, I am referring to following two slides. Also here is a link to the timestamped youtube video where he describes this weight space: https://youtu.be/0T57_yjjB58?list=PLoRl3Ht4JOcdU872GhiYWf6jwrk_SNhz9&t=80

Questions:

Why is the training case a plane? Why is this plane through origin?
Why is the input vector perpendicular to this plane?
How is the input vector different from the training case?

Best Answer

As you're likely aware, a vector in a $D$-dimensional space can be described by a direction and a magnitude. The direction requires $D-1$ values to describe, and the magnitude requires one value to describe.

Suppose we pick some directional unit vector $\hat{v}.$ By "unit vector" I mean that it has magnitude one. A vector $z$ that satisfies $z \cdot \hat{v} = \sum_j z_j \hat{v}_j = 0$ is perpendicular to the directional vector. The set of all vectors that can be perpendicular to $\hat{v}$ forms a $D-1$ dimensional hyperplane that goes through the origin. This is easy to see, because the constraint that it's perpendicular to $\hat{v}$ is really just the equation of a $D-1$ dimensional plane in $D$ dimensions, i.e. $$\sum_{j=1}^D \hat{v}_j z_j = 0.$$ This is a linear combination of the components of $z$ with a zero intercept. It exhibits plane-like behavior because, if you reduce the value of one component of $z$ by a certain amount, you must increase other component values in a linear fashion, so that the sum still reaches zero.

Any input vector $x$ in a perceptron can be expressed as $$x = r \hat{v}$$ where $r \in \mathbb{R}$ is a magnitude, and $\hat{v} \in \mathbb{R}^D$ is a directional vector that satisfies $|\hat{v}|=1.$ In fact, the perceptron only cares about the direction $\hat{v}$, not the magnitude $r$. To see this, look at the classification procedure for a perceptron: $$ f(x;w) = \left\{ \begin{split} 1 \hspace{1mm} \text{if} \hspace{1mm} x \cdot w >0 \\ 0 \hspace{1mm} \text{if} \hspace{1mm} x \cdot w \le 0 \end{split} \right. $$ where $w$ is the weight vector of the perceptron and I'm implicitly including the constant bias component to $x.$ It's easy to see that the conditions above for $x \cdot w$ equally apply to $\hat{v} \cdot w.$ Thus you can replace $x$ with $\hat{v}$ in the above condition and the perceptron function is unchanged.

As we established, a plane in $w$-space can be defined by the equation $\hat{v} \cdot w = 0.$ This, to answer your question, is how an input vector can be represented by a plane. To be more precise, an input direction defines the plane, but the plane is invariant to the magnitude of the vector. (The equation of a plane is identical if you multiply the whole thing by a constant.) So there's one missing piece of information on the plane, with regards to the vector. However, as we established, the perceptron algorithm doesn't care about the magnitude of the vector, so for all intents and purposes, the plane describes the vector entirely, to all the detail we care about.

If our value of $w$ falls on or below the plane defined by an input direction $\hat{v}$ (where by "below" I mean on the opposite side of the plane than $\hat{v}$) then the perceptron classifies this input as a "zero", otherwise it classifies it as a "one". To see why this is the case, we need only take advantage of a very elementary result of linear algebra, namely that $$ |w \cdot \hat{v}| = |w| |\hat{v}| \cos \theta = |w| \cos \theta, $$ where $\theta$ is the angle between $w$ and $\hat{v}$ (i.e. the angle between $w$ and $x$). The perceptron condition then becomes that the point is classified as a "one" if $\theta < \pi/2$ and a "zero" if $\pi/2 \le \theta \le \pi.$

(To answer your question 3, they're not different. The input vector is the training case.)

Best Answer

Related Solutions

Solved – Visualizing high dimensional binary data

Solved – Visualizing high dimensional data

Related Question