Have you validated your results in any way?
It seems that you want to do unsupervised classification. That usually doesn't really work too well, in particular for this kind of data and with this method. K-means is more a vector quantization method than meant to find how clusters are separated. I.e. it will - always - discretize your data into $k$ groups, even when there is no separating gap inbetween!
A negative value means that the record is more similar to the records of its neighboring
cluster than to other members of its own cluster.
This seems to be what is happening here. K-means breaks apart data that should be in the same cluster.
But my larger concern is that your data may be inappropriate for k-means. K-means minimizes the within-cluster-sum-of-squares (WCSS). But given that your axes are from different domains, they do not necessarily have the same scale. K-means implicitely assumes squared Euclidean distance (which is the sum-of-squares) and this may be an inappropriate measure of similarity for your data, in particular without extensive preprocessing. You could try the following approach:
- define an appropriate measure of similarity for your data. Spend a lot of effort here!
- use metric learning techniques (e.g. non-metric multidimensional scaling) to obtain a vector space where Euclidean distance is appropriate
- run k-means in this projected data
- to assign a new observation to the clusters, apply the same preprocessing as in 1), then the same projection as in 2) and then assign it to the nearest mean in 3)
A common failure with k-means is to run it on your data without first checking that this is appropriate; that the dimensions bear the same amount of relevant information on the same scale. The simplest heuristic is to use whitening but more often than not (e.g. when having discrete or binary attributes) this will not be enough.
But even with all these efforts, k-means may still fail badly. Because it assumes clusters have the same "diameter". So if one of your users has a very narrow usage profile (always using the webbrowser only, with a single tab), and the other has a very wide usage (word, browser, email, ...) all open at the same time or not, k-means may just be based on the wrong assumptions: clusters in k-means are expected to have the same diameter.
Most clustering algorithms prefer minimizing spread over cluster element count. I.e. they try to find clusters of small extent to cover everything, not clusters of even size.
I'm pretty sure there must be more algorithms, but the only one I have recently come across that tries to keep cluster sizes the same is this Tutorial:
http://elki.dbs.ifi.lmu.de/wiki/Tutorial/SameSizeKMeans
In your case, I guess hierarchical clustering would be better than k-means. But in hierarchical clustering, ensuring same-sized clusters seems quite hard. At some point, you will have to do some really bad cluster assignment if you want to fix cluster sizes.
This is most obvious if you have a data set with extremely well separated clusters, but different size. Say you have 100 instances that are $N(0;1)$ distributed, and 1000 instances that are $N(10;1)$ distributed. If you enforce the clusters to have the same size, the result will be really, really bad by any measure.
Best Answer
You used a single metric to classify the users into clusters? I'll assume you have additional, descriptive information about these events. One heuristic would be to run a summary of cluster central tendencies (e.g., means, medians, etc.) based on the cluster assignments across the descriptive information. So, if you have k=3 and x=20 (both k and x are irrelevant, x being the number of descriptors or features), then the output would create a 20 (rows) by 3 (columns) summary matrix for analysis. Next, to determine how the clusters differ on each descriptor, create an index based on the cluster value divided by the global value across all users for each descriptor. This index would be like an IQ score where 100 is "normal," 120+ and 80 or less indicating descriptors that are suggesting behaviors that diverge from the norm. 120+ and 80 or less are like "quick and dirty" significance tests for between group (clusters) differences.