There are a number of options.
k-medoids clustering
First, you could try partitioning around medoids (pam) instead of using k-means clustering. This one is more robust, and could give better results. Van der Laan reworked the algorithm. If you're going to implement it yourself, his article is worth a read.
There is a specific k-medoids clustering algorithm for large datasets. The algorithm is called Clara in R, and is described in chapter 3 of Finding Groups in Data: An Introduction to Cluster Analysis. by Kaufman, L and Rousseeuw, PJ (1990).
hierarchical clustering
Instead of UPGMA, you could try some other hierarchical clustering options. First of all, when you use hierarchical clustering, be sure you define the partitioning method properly. This partitioning method is essentially how the distances between observations and clusters are calculated. I mostly use Ward's method or complete linkage, but other options might be the choice for you.
Don't know if you tried it yet, but the single linkage method or neighbour joining is often preferred above UPGMA in phylogenetic applications. If you didn't try it yet, you could give it a shot as well, as it often gives remarkably good results.
In R you can take a look at the package cluster. All described algorithms are implemented there. See ?pam, ?clara, ?hclust, ... Check also the different implementation of the algorithm in ?kmeans. Sometimes chosing another algorithm can improve the clustering substantially.
EDIT : Just thought about something: If you work with graphs and nodes and the likes, you should take a look at the markov clustering algorithm as well. That one is used for example in grouping sequences based on blast similarities, and performs incredibly well. It can do the clustering for you, or give you some ideas on how to solve the research problem you're focusing on. Without knowing anything about it in fact, I guess his results are definitely worth looking at. If I may say so, I still consider this method of Stijn van Dongen one of the nicest results in clustering I've ever encountered.
http://www.micans.org/mcl/
Best Answer
Inverted lists work very well for sparse data. This is what e.g. Lucene uses.
I don't know how extensible scikit-learn is. A lot of the code in it seems to be written in Cython, so it is Python-like code compiled via C. This would make it harder to extend.
ELKI, the data mining tool I am contributing a lot to, has an - yet unpublished and undocumented - Lucene addon. This would likely work for you. I hope to at some point also have an inverted index for sparse vectors in ELKI main (because of the Lucene dependency, I plan on keeping this addon separate).
We also have (non integrated) code for a prefix-tree index for accelerating Levenshtein distance. But this needs some more work to integrate it, and maybe some profiling.
Most of the time, indexes only work for a particular distance. There is no general purpose index that can support arbitrary distances at the same time. There are some indexes (e.g. M-tree, and iDistance, both available in ELKI) that do work with arbitrary distances, but only one distance at a time. But how well they work for your data and distance varies a lot. Usually, you need a good numerical contrast on your similarities.
The question you need to ask yourself is: is there a way to find all objects within a radius of $\varepsilon$ (or a similarity larger than $\varepsilon$) without comparing every object to every other object.
Note that for DBSCAN you can use fake distances. The actual distances are not used; only a binary decision is needed ($d\leq\varepsilon$). This is formalized as GeneralizedDBSCAN. So if your can implement a "distance function" that returns 0 for "similar" and 1 for "not similar", and plug this into scikit-learn's DBSCAN, you should be fine. Depending on the architecture of scikit-learn, you maybe can plug in a custom index disguised as distance function. Inverted lists are a good candidate for binary data.