Why don't just first look at other clustering algorithms? For example DBSCAN does not need to know the number of clusters, and it can work with any distance function (assuming that you want to go with something like cosine distance for your sparse vectors).
Given the characteristics of your data, k-means is just bound to fail. The problem is that the means will most likely be no longer sparse, so they are actually outliers, and by no means central objects. Don't use k-means with sparse vectors or non-euclidean distances! (k-means may not converge for other distance functions, as the mean might not minimize the objective function anymore!)
Seriously, there are at least 100 more modern clustering algorithms. Try these first.
Only single-linkage is optimal. Complete-linkage fails, so your intuition was wrong there. ;-)
As a simple example, consider the one-dimensional data set 1,2,3,4,5,6.
The (unique) optimum solution for complete linkage with two clusters is (1,2,3) and (4,5,6) (complete linkage height 2). The (unique) optimum solution with three clusters is (1,2), (3,4), (5,6), at linkage height 1.
There exists no agglomerative hierarchy that contains both, obviously, because these partitions do not nest. Hence, no hierarchical clustering contains all optimal solutions for complete linkage. I would even expect complete linkage to exhibit among the worst gap between the optimum solution and the agglomerative solution at high levels, based on that example... An this example likely is a counter example for most other schemes (except single linkage, where this data set degenerates).
You may be interested in studying this article:
Gunnar E. Carlsson, Facundo Mémoli:
Characterization, Stability and Convergence of Hierarchical Clustering Methods. Journal of Machine Learning Research 11: 1425-1470 (2010)
and the references contained therein, in particular also:
Kleinberg, Jon M. "An impossibility theorem for clustering." Advances in neural information processing systems. 2003.
These seem to indicate that given some mild stability requirements (which can be interpreted as uncertainty of our distance measurements), only single-linkage clustering remains. Which, unfortunately, is all but usable in most cases... so IMHO for practical purposes the authors overshoot a bit, neglecting usefulness of the result over pretty theory.
Best Answer
Yes, it produces non-convex clusters.
You have an example in your screenshot.
But also no: it does not "draw cluster boundaries". There is no concept of a cluster boundary in hierarchical clustering. Clusters are just sets of points.
In particular single-linkage tends to form "chains" that can be non-convex. Since it uses nearest neighbors only, it ignores the overall shape of the cluster.