A very common solution for this very common problem (ie, over-weighting variables) is to standardize your data.
To do this, you just perform two successive column-wise operations on your data:
The rationale of these two operations is to ensure the values have zero-mean (when subtracting the mean in the numerator) and unit-variance (by dividing with the standard deviation in the denominator).
For instance, in NumPy:
>>> # first create a small data matrix comprised of three variables
>>> # having three different 'scales' (means and variances)
>>> a = 10*NP.random.rand(6)
>>> b = 50*NP.random.rand(6)
>>> c = 2*NP.random.rand(6)
>>> A = NP.column_stack((a, b, c))
>>> A # the pre-standardized data
array([[ 1.753, 37.809, 1.181],
[ 1.386, 8.333, 0.235],
[ 2.827, 40.5 , 0.625],
[ 5.516, 47.202, 0.183],
[ 0.599, 27.017, 1.054],
[ 8.918, 35.398, 1.602]])
>>> # mean center the data (columnwise)
>>> A -= NP.mean(A, axis=0)
>>> A
array([[ -1.747, 5.099, 0.368],
[ -2.114, -24.377, -0.578],
[ -0.673, 7.79 , -0.189],
[ 2.016, 14.493, -0.631],
[ -2.901, -5.693, 0.24 ],
[ 5.418, 2.688, 0.789]])
>>> # divide by the standard deviation
>>> A /= NP.std(A, axis=0)
>>> A
array([[-0.606, 0.409, 0.716],
[-0.734, -1.957, -1.125],
[-0.233, 0.626, -0.367],
[ 0.7 , 1.164, -1.228],
[-1.007, -0.457, 0.468],
[ 1.881, 0.216, 1.536]])
Do not compare the raw numbers.
An Euclidean distance of $x$ and a cosine similarity of $y$ and Euclidean distance on normalized vectors of $z$ live on different scales and just cannot be compared. This is like comparing "shoe size" with "mass in lb". There probably is some relationship, but none that you can see by looking at a small sample.
k-means optimizes Euclidean distances (and it shouldn't really be used with other distances, at the mean may not be a least squares estimator of the central object). So the clusters you computed already favor Euclidean neighbors. If you then refine your search using Cosine distance, it will likely be only within the subset that was already Euclidean-similar.
So a lot of the things you observe probably are artifacts from the clustering preprocessing you did and similar aspects of your code.
Note that when using non-normalized vectors, you essentially put a high weight on the document size. This is why TF-IDF is commonly used: a document and the concatenation of two copies of the same document then have the same vector.
Best Answer
Align similarity formula with conceptual notion of similarity
Inspect groupings produced by different similarity formulas
A starting point