The idea underlying the k-means algorithm is to try to find clusters that minimize the within-cluster variance (or up to a constant the corresponding sum of squares or SS), which amounts to maximize the between-cluster SS because the total variance is fixed. As mentioned on the wiki, you can directly use the within SS and look at its variation when increasing the number of clusters (like we would do in Factor Analysis with a screeplot): an abrupt change in how SS evolve is suggestive of an optimal solution, although this merely stands from visual appreciation. As the total variance is fixed, it is equivalent to study how the ratio of the between and total SS, also called the percentage of variance explained, evolves, because in this the case it will present a large gap from one k to the next k+1. (Note that the between/within ratio is not distributed as an F-distribution because k is not fixed; so, test are meaningless.)
In sum, you just have to compute the squared distance between each data point and their respective center (or centroid), for each cluster--this gives you the within SS, and the total within SS is just the sum of the cluster-specific WSS (transforming them to variance is just a matter of dividing by the corresponding degrees of freedom); the between SS is obtained by substracting the total WSS from the total SS, the latter being obtained by considering k=1 for example.
By the way, with k=1, WSS=TSS and BSS=0.
If you're after determining the number of clusters or where to stop with the k-means, you might consider the Gap statistic as an alternative to the elbow criteria:
Tibshirani, R., Walther, G., and
Hastie, T. (2001). Estimating the
numbers of clusters in a data set via
the gap statistic. J. R. Statist.
Soc. B, 63(2): 411-423.
k-means++ is not meant to improve the accuracy.
What k-means++ is meant to improve is the starting conditions, making k-means to be more likely to converge to a reasonably good local optimum, and faster than with random initialization; largely by ensuring that the cluster centers are not too close to each other initially.
Still, k-means++ means to preserve randomness, and you are expected to try multiple runs and keep the best result. In which case you cannot expect k-means++ to produce better results than k-means. You can only expect to get them faster (in computation), and with fewer tries.
Best Answer
this is a great paper to start with:
Estimating the number of clusters in a data set via the gap statistics
It's really easy to implement something similary in any language.