This question comes up a lot in various guises. What is common to them is
How can I combine moment-based statistics that have been computed from disjoint subsets of my data?
The simplest application concerns data that have been split into two groups. You know the group sizes and the group means. In terms of these four quantities alone, what is the overall mean of the data?
Other applications generalize from means to variances, standard deviations, covariance matrices, skewnesses, and multivariate statistics; and might involve multiple subgroups of data. Notice that many of these quantities are somewhat complicated combinations of moments: the standard deviation, for instance, is the square root of a quadratic combination of the first and second moments (mean and mean square).
All such cases are easily handled by reducing the various moments to sums, because sums are obviously and easily combined: they are added. Mathematically, it comes down to this: you have a batch of data $X = (x_1, x_2, \ldots, x_n)$ that have been separated into disjoint groups of sizes $j_1, j_2, \ldots, j_g$: $(x_1, x_2, \ldots, x_{j_1}; x_{j_1+1}, \ldots, x_{j_1+j_2}; x_{j_1+j_2+1}, \ldots; \ldots; \ldots, x_n)$. Let's call the $i$th group $X_{(i)} = (x_{j_i+1},x_{j_i+2}, \ldots, x_{j_{i+1}})$. By definition, the $k$th moment of any batch of data $y_1, \ldots, y_j$ is the average of $k$th powers,
$$\mu_k(y) = \left(y_1^k + y_2^k + \cdots + y_j^k\right)/j.$$
Obviously $j \mu_k(y)$ is the sum of the $k$th powers. Therefore, referring to our previous decomposition of data into $g$ subgroups, we can break a sum of $n$ powers into groups of sums, obtaining
$$\eqalign{
n \mu_k(X) &= \left(x_1^k + x_2^k + \cdots + x_n^k\right) \\
&= \left(x_1^k + x_2^k + \cdots + x_{j_1}^k\right) + \cdots + \left(x_{j_1+\cdots+j_{g-1}+1}^k + x_{j_1+\cdots+j_{g-1}+2}^k + \cdots + x_n^k\right)\\
&= j_1 \mu_k(X_{(1)}) + j_2 \mu_k(X_{(2)}) + \cdots + j_g \mu_k(X_{(g)}).
}$$
Dividing by $n$ exhibits the $k$th moment of the entire batch in terms of the $k$th moments of its subgroups.
In the present application, the entries in the covariance matrix are, of course, covariances, which are expressible in terms of multivariate second moments and first moments. The key part of the calculation comes down to this: at each step you will have focused on two particular components of your multivariate data; let's call them $x$ and $y$. The numbers you are looking at are in the form
$$((x_1,y_1), (x_2,y_2), \ldots, (x_n,y_n)),$$
broken up as before into $g$ groups. For each group you know the average sum of products of the $x_iy_i$: this is the $(1,1)$ multivariate moment, $\mu_{(1,1)}$. To combine these group values, you will multiply them by the group sizes, add up those results, and divide the total by $n$.
To apply this approach you need to think ahead: it is not possible to combine, say, covariances if you know only the covariances and the subgroup sizes: you also need to know the means of the subgroups (because means are involved in an essential way in all covariance formulas), or something algebraically reducible to the means. You also might need to take some care concerning any constants that appear in the formulas; the chief trap for the unwary is to confuse a "sample covariance" (which involves a sum of products divided by $n-1$) with a "population covariance" (where the division is by $n$). This does not introduce anything new; you just have to remember to multiply the sample covariance by $n-1$ (or group covariance by $j_i-1$) to recover the sum, rather than by $n$ (or $j_i$).
Oh, yes: about the present question. The formula given in the Wikipedia article is given in terms of group means (first moments) and the group sums of products. As I described above, these will get combined by adding them and then adjusting the results with a division to obtain the covariances. The final division by $n$ is not shown.
Best Answer
While searching through unanswered questions I noticed this one again and decided, in agreement with whuber, that keeping essentially answered questions off of the unanswered tab is higher priority than my own personal preferences about what is "worthy" of answer vs. comment status, so I pasted my comment as an answer.
They are different because ${\bf K}_{X} + {\bf K}_Y$ is the sum of two covariance matrices while ${\bf K}_{X+Y}$ is the covariance matrix of the random variable $X+Y$. To see why the two matrices are different, use the bilinearity of covariance to see that
$$ [{\bf K}_{X+Y}]_{ij}=[{\bf K}_{X}]_{ij} +[{\bf K}_{Y}]_{ij}+ {\rm cov}(X_i,Y_j)+{\rm cov}(X_j,Y_i)$$
i.e. the cross-covariances are missing from ${\bf K}_{X} + {\bf K}_Y$ (note I assume $X,Y$ are of equal dimension to ensure that question makes sense). So, ${\bf K}_{X+Y}$ is the covariance matrix of $X+Y$ and ${\bf K}_{X} + {\bf K}_Y$ represents the special case where ${\rm cov}(X_i,Y_j)=-{\rm cov}(X_j,Y_i)$ for each pair $(i,j)$, the most notable example being when every element of $X$ is uncorrelated with every element of $Y$.