So, we want to prove that these two statements are equivalent:
(a) The points $x_1, \dots , x_n \in \mathbb{R}^d$ are affinely independent.
(b) The vectors $\overline{x}_1, \dots , \overline{x}_n \in \mathbb{R}^{d+1}$ are linearly independent.
Where $\overline{x}_i = (1, x_i),\ i = 1, \dots , n$.
Let's go.
$\mathbf{(a)\Longrightarrow (b)}$. Let $\lambda_1, \dots , \lambda_n \in \mathbb{R}$ be such that
$$
\lambda_1 \overline{x}_1 + \dots + \lambda_n \overline{x}_n = 0 \ . \qquad \qquad \qquad [1]
$$
We have to show that $\lambda_1 = \dots = \lambda_n = 0$. But $[1]$ means
$$
\lambda_1 (1, x_1) + \dots + \lambda_n (1, x_n) = (0, 0) \ ,
$$
where $(0,0) \in \mathbb{R} \times \mathbb{R}^d$. And this is equivalent to
$$
\lambda_1 x_1 + \dots + \lambda_n x_n = 0 \qquad \text{and} \qquad \lambda_1 + \dots + \lambda_n = 0 \ .
$$
Now, $x_i = x_i - 0 = \overrightarrow{0x_i} , \ i = 1, \dots , n$. (Here, $0 \in \mathbb{R}^d$.) So, since we are assuming $(a)$, it follows that
$$
\lambda_1 = \dots = \lambda_n = 0 \ .
$$
$\mathbf{(b)\Longrightarrow (a)}$. Let $p \in \mathbb{R}^d$ be any point. We have to show that
$$
\lambda_1 \overrightarrow{ px}_1 + \dots + \lambda_n \overrightarrow{ px}_n = 0 \qquad \text{and} \qquad \lambda_1 + \dots + \lambda_n = 0 \qquad \qquad \qquad [2]
$$
implies $\lambda_1 = \dots = \lambda_n = 0$.
If the point $p$ was $0 \in \mathbb{R}^d$, the conclusion should be clear because, in this case, $\overrightarrow{px_i} = x_i, \ i = 1, \dots , n$, and $[2]$ reads as follows:
$$
\lambda_1 x_1 + \dots + \lambda_n x_n = 0 \qquad \text{and} \qquad \lambda_1 + \dots + \lambda_n = 0 \ . \qquad \qquad \qquad [3]
$$
From here, we do the same reasoning as in the previous proof, but backwars: these two things entail
$$
\lambda_1 (1, x_1) + \dots + \lambda_n (1, x_n) = (0, 0) \ .
$$
Which is the same as
$$
\lambda_1 \overline{x}_1 + \dots + \lambda_n \overline{x}_n = 0 \ .
$$
And this implies
$$
\lambda_1 = \dots = \lambda_n = 0\ ,
$$
since we are assuming $(b)$.
Hence, we have to show that the particular case $[3]$ already implies the general one $[2]$, for every $p\in \mathbb{R}^d$. But this is obvious:
$$
\lambda_1 \overrightarrow{ px}_1 + \dots + \lambda_n \overrightarrow{ px}_n = \lambda_1 (x_1 -p ) + \dots + \lambda_n (x_n - p)
$$
Which is
$$
\lambda_1 x_1 + \dots + \lambda_n x_n - (\lambda_1 + \dots + \lambda_n)p = \lambda_1 x_1 + \dots + \lambda_n x_n = 0 \ .
$$
No, there is no mistake there.
Consider the set of points:
$$
F = \left\{x = \lambda_1 x_1 + \dots + \lambda_n x_n \in \mathbb{R}^d \ \vert \ \lambda_1 + \dots +\lambda_n = 0 \right\} \ .
$$
This set of points is a linear subspace of $ \mathbb{R}^d$, as you can easily check. If you solve for $\lambda_1$ the equation $\lambda_1 + \dots +\lambda_n = 0 $, you find that the vectors of $F$ can be written as
$$
x = -(\lambda_2 + \dots + \lambda_n)x_1 + \lambda_2x_2 + \dots + \lambda_n x_n = \lambda_2(x_2 - x_1) + \dots + \lambda_n (x_n - x_1) \ .
$$
That is,
$$
F = \mathrm{span}\left\{ \overrightarrow{x_1x_2}, \dots , \overrightarrow{x_1x_n}\right\} \ .
$$
(You could have done the same with any $x_i$ instead of $x_1$ too.)
Now, the following two statements are equivalent:
- Points $x_1, \dots , x_n$ are affinely independent.
- Vectors $\overrightarrow{x_1x_2}, \dots , \overrightarrow{x_1x_n} $ are linearly independent.
$\mathbf{(1) \Longrightarrow (2)}$. Let
$$
\mu_2 \overrightarrow{x_1x_2} + \dots + \mu_n \overrightarrow{x_1x_n} = 0
$$
We have to show that this implies $\mu_2 = \dots = \mu_n = 0$. Indeed,
$$
0 =\mu_2 \overrightarrow{x_1x_2} + \dots + \mu_n \overrightarrow{x_1x_n} = -(\mu_2 + \dots + \mu_n)x_1 + \mu_2 x_2 + \dots \mu_n x_n \ .
$$
In this expression, the sum of all coefficients is $0$. Since we are assuming $(1)$, this implies $\mu_2 = \dots = \mu_n = 0$.
$\mathbf{(2) \Longrightarrow (1)}$. Let
$$
\lambda_1 x_1 + \dots + \lambda_n x_n = 0 \qquad \text{and} \qquad \lambda_1 + \dots + \lambda_n = 0 \ .
$$
We have to show that this implies $\lambda_1 = \dots = \lambda_n = 0$. Indeed, solve the second equation for $\lambda_1$ again and you have
$$
0 = \lambda_1 x_1 + \dots + \lambda_n x_n = - (\lambda_2 + \dots + \lambda_n) x_1 + \lambda_2 x_2 + \dots + \lambda_n x_n = \lambda_2 \overrightarrow{x_1x_2} + \dots + \lambda_n \overrightarrow{x_1x_n} \ .
$$
Since we are assuming $(2)$, this implies $\lambda_2 = \dots = \lambda_n = 0$ and, since $\lambda_1 + \dots + \lambda_n = 0$, we have $\lambda_1 = 0$ too.
So far so good. Now, let's finish with another trivial remark about a geometrical interpretation of this linear subspace $F$ and that condition $\lambda_1 + \dots + \lambda_n = 0$. Consider the set of points
$$
V = \left\{x = \lambda_1 x_1 + \dots + \lambda_n x_n \in \mathbb{R}^d \ \vert \ \lambda_1 + \dots +\lambda_n = 1 \right\} \ .
$$
This set is an affine subspace. Indeed,
$$
V = x_1 + F \ .
$$
(You should check this equality and understand that you could put any $x_i$ in the place of $x_1$.)
You can say that $V$ is parallel to the subspace $F$: indeed, $V$ "is" just $F$ translated by $x_1$.
So what? What's so special about $V$? Well, on one hand, $V$ contains all the points $x_1 , \dots , x_n$ (exercise: check it!). On the other hand, it is the smallest affine subspace which contains them; in the sense that, if $W \subset \mathbb{R}^d$ is another affine subspace containing all $x_i$, then $V \subset W$.
Indeed, in general, if you have an affine subspace $W = p + G$ and two points in it $x, y \in W$, then $\overrightarrow{xy} \in G$. So, if $x_1, \dots , x_n \in W$, then $G$ must contain all $\overrightarrow{x_1x_i}$. Hence, $F \subset G$. So $V = x_1 + F \subset x_1 + G = W$.
Summing up: the condition that annoys you, $\lambda_1 + \dots + \lambda_n = 0$, makes the set $V$ to be the smallest affine subspace which contains all the points $x_1, \dots , x_n$.
EDIT. I forgot. Perhaps it would be a good exercise to redo everything we have seen here with some specific examples. For instance, take:
- $x_1 = (1,0), x_2 = (0,1)$ in $\mathbb{R}^2$.
- $x_1 = (1,0,0), x_2 = (0,1,0), x_3 = (0,0,1)$ in $\mathbb{R}^3$.
- $x_1 = (1,0), x_2 = (0,1), x_3 = (1/2, 1/2)$ in $\mathbb{R}^2$.
Best Answer
Suppose to the contrary that $(a_i)_{i\in I}$ is not affinely independent, that is, for some (actually every) $k\in I$, the vectors $a_i-a_k$ are linearly dependent.
This implies that one of them is a linear combination of the other ones: $$a_j-a_k=\sum_{i\ne j,k}\lambda_i(a_i-a_k)\\ a_j=\sum_{i\ne j,k}\lambda_ia_i+(1-\sum\lambda_i)a_k$$ Or, subtracting any $a\in E$, $a_j-a=\sum_{i\ne j,k}\lambda_i (a_i-a)+(1-\sum\lambda_i)(a_k-a)$, which shows that $a_j$ is an affine combination of the other ones.