Well, since no one gave a complete answer yet--and because I wrote one anyway--here's the proof by induction, in a manner which is hopefully easy for students (without much proof experience) to understand. Credit goes to the Wu and Wu paper posted by @Jeff.
Both sides of the Schwarz inequality are real numbers $\geq 0$. If $\sum_{j=1}^n |a_j|^2 \sum_{j=1}^n |b_j|^2 = 0$, then it must be that $a_1 = a_2 = \ldots = a_n = 0$ and/or $b_1 = b_2 = \ldots = b_n = 0$, so clearly $|\sum_{j=1}^n a_j \overline{b_j}|^2$ also $= 0$ and we are done. Now we only need to prove the case in which both sides of the inequality are positive.
Base Case. For $n = 1$, we have
$$|\sum_{j=1}^1 a_j \overline{b_j}|^2 = |a_j \overline{b_j}|^2
= |a_j|^2 |b_j|^2 = \sum_{j=1}^1 |a_j|^2 \sum_{j=1}^1 |b_j|^2.$$
Inductive Step. The inductive hypothesis is $|\sum_{j=1}^{n-1} a_j \overline{b_j}|^2 \leq \sum_{j=1}^{n-1} |a_j|^2 \sum_{j=1}^{n-1} |b_j|^2$. Since we only need to worry about the case in which both sides are positive, so we can take the square root to obtain
$$|\sum_{j=1}^{n-1} a_j \overline{b_j}| \leq \sqrt{\sum_{j=1}^{n-1} |a_j|^2 \sum_{j=1}^{n-1} |b_j|^2}.$$
Thus $|\sum_{j=1}^n a_j \overline{b_j}|$
$= |\sum_{j=1}^{n-1} a_j \overline{b_j} + a_n \overline{b_n}|$
$\leq |\sum_{j=1}^{n-1} a_j \overline{b_j}| + |a_n \overline{b_n}|$ (by the triangle inequality)
$\leq \sqrt{\sum_{j=1}^{n-1} |a_j|^2 \sum_{j=1}^{n-1} |b_j|^2} + |a_n \overline{b_n}|$
(by the inductive hypothesis)
$= \sqrt{\sum_{j=1}^{n-1} |a_j|^2} \sqrt{\sum_{j=1}^{n-1} |b_j|^2} + |a_n| |b_n|.$
Here we're a little stuck. We want to be able to square $|a_n|$ and $|b_n|$ and bring them into their respective square-rooted sums. So if we label $a = \sqrt{\sum_{j=1}^{n-1} |a_j|^2}$, $b = \sqrt{\sum_{j=1}^{n-1} |b_j|^2}$, $c = |a_n|$, and $d = |b_n|$, we want to be able to say $ab + cd \leq \sqrt{a^2 + c^2} \sqrt{b^2 + d^2}$. In fact, we can say it! This inequality is always true for any $a, b, c, d \in \mathbb{R}$, because
$0 \leq (ad - bc)^2 = a^2 d^2 - 2abcd + b^2 c^2$
$\Rightarrow 2abcd \leq a^2 d^2 + b^2 c^2$
$\Rightarrow a^2 b^2 + 2abcd + c^2 d^2 \leq a^2 b^2 + a^2 d^2 + b^2 c^2 + c^2 d^2$
$\Rightarrow (ab + cd)^2 \leq (a^2 + c^2)(b^2 + d^2),$
and since both sides are positive reals, we can take the square root.
We now use this inequality to obtain
$|\sum_{j=1}^n a_j \overline{b_j}| \leq \sqrt{\sum_{j=1}^{n-1} |a_j|^2} \sqrt{\sum_{j=1}^{n-1} |b_j|^2} + |a_n| |b_n|$
$\leq \sqrt{\sum_{j=1}^{n-1} |a_j|^2 + |a_n|^2} \sqrt{\sum_{j=1}^{n-1} |b_j|^2 + |b_n|^2}$
$= \sqrt{\sum_{j=1}^n |a_j|^2 \sum_{j=1}^n |b_j|^2},$
and just square both sides to complete the inductive step.
The property of $\mathbb{N}$ which allows us to do what you want is the fact that it is well-ordered. For those not familiar:
Definition: An order $\leqslant$ on a set $A$ is a well-ordering if it satisfies the following conditions:
- $a\leqslant a$ for each $a\in A$ (reflexivity),
- For each $a,b\in A$, either $a\leqslant b$ or $b\leqslant a$ holds (comperability),
- $a\leqslant b$ and $b\leqslant a$ implies $a=b$ for each $a,b\in A$ (symmetry),
- $a\leqslant b$ and $b\leqslant c$ implies $a\leqslant c$ for each $a,b,c\in A$ (transitivity),
- For each $S\subseteq A$, $S$ has a least element; that is, there is some $s\in S$ such that $s\leqslant a$ for each $a\in S$ (well-ordering).
There is a theorem, the Well-Ordering Theorem, which is equivalent to the axiom of choice, which states that every set can be well-ordered. Using this theorem we may "disjointize" any collection of sets.
Let $\mathcal A= \{A_i\}_{i\in I}$ be a collection of sets indexed by the the set $I$. We will construct the sets $B_i$, which have the desired property that $$\bigcup_{i\in I} B_i=\bigcup_{i\in I} A_i,$$ using a process called transfinite induction. Let $\leqslant$ be a well-ordering of $I$. Let $a$ be the least element of $I$ and write $B_a=A_a.$ Now let $i\in I$ be such that for each $j< i$ we have constructed pairwise disjoint sets $B_j$ from the collection $\mathcal{A}.$ Then let $B_i=A_i\setminus \bigcup_{j<i} A_j$. It is clear that $B_i\cap B_j=\varnothing$ for each $j<i$. Since $B_i\subseteq A_i$ for each $i\in I$, we have $$\bigcup_{i\in I} B_i\subseteq\bigcup_{i\in I} A_i.$$ To see the reverse inclusion, observe that for each $x\in\bigcup_{i\in I} A_i$, the set $C_x=\{i\in I\;|\;x\in A_i\}\subseteq I$ has a least element $j$ and that $x\in B_j,$ so $$\bigcup_{i\in I} B_i\supseteq\bigcup_{i\in I} A_i,$$ giving equality of the two sets.
At first it may seem that transfinite induction works even with sets which are only totally ordered, such as the closed unit interval $[0,1].$ However, this is not the case. To see this, I recommend reading the proof that transfinite induction works. Intuitively, without a well-ordering, one has trouble "moving on to the next element" while performing the induction. I used the first chapter of Munkres' Topology, but there may be better sources. I enjoyed Munkres because he goes through a good amount of material, and the supplementary exercises at the end of nearly every chapter are challenging and illuminating.
Best Answer
Thanks for your question.
I will continue from inductive step.
Inductive step: Assume $P(k)$, then we want to show it holds for the inductive step $P(k+1)$:
$$\bigcup_{j=1}^{k+1} A_j \subseteq \bigcup_{j=1}^{k+1} B_j = \left(A_1 \bigcup A_2 \bigcup ... \bigcup A_k\right) \bigcup A_{k+1} \subseteq \left( B_1 \bigcup B_2 \bigcup ... \bigcup B_k\right) \bigcup B_{k+1}.$$
You can then consider the two paraenthsized groups as one group and thus you can consider them as 2 elements similar to how you did with base case, which will give you final result.
Please let me know if anything is not clear.