Linear Algebra – Definition of Affine Independence in Brondsted’s Convex Polytopes

affine-geometrylinear algebra

At one point in the book (An Introduction to Convex Polytopes, by Arne Brondsted) a definition of affine independence is given as follows,

An n-family $(x_{1},…,x_{n})$ of points from $\mathbb{R}^d$ is said to be affinely independent if a linear combination $\lambda_{1} x_{1} + … + \lambda_{n} x_{n}$ with $\lambda_{1} + … + \lambda_{n} = 0$ can only have the value zero vector when $\lambda_{1}=…=\lambda_{n}=0$.

It is my hunch that affine independence is analogous to linear independence in that, a set of vectors is (affinely/linearly) independent if none of the vectors is an (affine/linear) combination of the others. If this is the case, then what does the condition $\lambda_{1} +… +\lambda_{n} = 0$ have to do with anything? Shouldn't it be that the linear combination $\lambda_{1}x_{1} +…+\lambda_{n}x_{n}$, with $\lambda_{1} + … +\lambda_{n} =1$ can only have the value zero vector when $\lambda_{1}=…=\lambda_{n}=0$?

Best Answer

No, there is no mistake there.

Consider the set of points:

$$ F = \left\{x = \lambda_1 x_1 + \dots + \lambda_n x_n \in \mathbb{R}^d \ \vert \ \lambda_1 + \dots +\lambda_n = 0 \right\} \ . $$

This set of points is a linear subspace of $ \mathbb{R}^d$, as you can easily check. If you solve for $\lambda_1$ the equation $\lambda_1 + \dots +\lambda_n = 0 $, you find that the vectors of $F$ can be written as

$$ x = -(\lambda_2 + \dots + \lambda_n)x_1 + \lambda_2x_2 + \dots + \lambda_n x_n = \lambda_2(x_2 - x_1) + \dots + \lambda_n (x_n - x_1) \ . $$

That is,

$$ F = \mathrm{span}\left\{ \overrightarrow{x_1x_2}, \dots , \overrightarrow{x_1x_n}\right\} \ . $$

(You could have done the same with any $x_i$ instead of $x_1$ too.)

Now, the following two statements are equivalent:

  1. Points $x_1, \dots , x_n$ are affinely independent.
  2. Vectors $\overrightarrow{x_1x_2}, \dots , \overrightarrow{x_1x_n} $ are linearly independent.

$\mathbf{(1) \Longrightarrow (2)}$. Let

$$ \mu_2 \overrightarrow{x_1x_2} + \dots + \mu_n \overrightarrow{x_1x_n} = 0 $$

We have to show that this implies $\mu_2 = \dots = \mu_n = 0$. Indeed,

$$ 0 =\mu_2 \overrightarrow{x_1x_2} + \dots + \mu_n \overrightarrow{x_1x_n} = -(\mu_2 + \dots + \mu_n)x_1 + \mu_2 x_2 + \dots \mu_n x_n \ . $$

In this expression, the sum of all coefficients is $0$. Since we are assuming $(1)$, this implies $\mu_2 = \dots = \mu_n = 0$.

$\mathbf{(2) \Longrightarrow (1)}$. Let

$$ \lambda_1 x_1 + \dots + \lambda_n x_n = 0 \qquad \text{and} \qquad \lambda_1 + \dots + \lambda_n = 0 \ . $$

We have to show that this implies $\lambda_1 = \dots = \lambda_n = 0$. Indeed, solve the second equation for $\lambda_1$ again and you have

$$ 0 = \lambda_1 x_1 + \dots + \lambda_n x_n = - (\lambda_2 + \dots + \lambda_n) x_1 + \lambda_2 x_2 + \dots + \lambda_n x_n = \lambda_2 \overrightarrow{x_1x_2} + \dots + \lambda_n \overrightarrow{x_1x_n} \ . $$

Since we are assuming $(2)$, this implies $\lambda_2 = \dots = \lambda_n = 0$ and, since $\lambda_1 + \dots + \lambda_n = 0$, we have $\lambda_1 = 0$ too.

So far so good. Now, let's finish with another trivial remark about a geometrical interpretation of this linear subspace $F$ and that condition $\lambda_1 + \dots + \lambda_n = 0$. Consider the set of points

$$ V = \left\{x = \lambda_1 x_1 + \dots + \lambda_n x_n \in \mathbb{R}^d \ \vert \ \lambda_1 + \dots +\lambda_n = 1 \right\} \ . $$

This set is an affine subspace. Indeed,

$$ V = x_1 + F \ . $$

(You should check this equality and understand that you could put any $x_i$ in the place of $x_1$.)

You can say that $V$ is parallel to the subspace $F$: indeed, $V$ "is" just $F$ translated by $x_1$.

So what? What's so special about $V$? Well, on one hand, $V$ contains all the points $x_1 , \dots , x_n$ (exercise: check it!). On the other hand, it is the smallest affine subspace which contains them; in the sense that, if $W \subset \mathbb{R}^d$ is another affine subspace containing all $x_i$, then $V \subset W$.

Indeed, in general, if you have an affine subspace $W = p + G$ and two points in it $x, y \in W$, then $\overrightarrow{xy} \in G$. So, if $x_1, \dots , x_n \in W$, then $G$ must contain all $\overrightarrow{x_1x_i}$. Hence, $F \subset G$. So $V = x_1 + F \subset x_1 + G = W$.

Summing up: the condition that annoys you, $\lambda_1 + \dots + \lambda_n = 0$, makes the set $V$ to be the smallest affine subspace which contains all the points $x_1, \dots , x_n$.

EDIT. I forgot. Perhaps it would be a good exercise to redo everything we have seen here with some specific examples. For instance, take:

  1. $x_1 = (1,0), x_2 = (0,1)$ in $\mathbb{R}^2$.
  2. $x_1 = (1,0,0), x_2 = (0,1,0), x_3 = (0,0,1)$ in $\mathbb{R}^3$.
  3. $x_1 = (1,0), x_2 = (0,1), x_3 = (1/2, 1/2)$ in $\mathbb{R}^2$.