You will first need to get a working similarity measure. You can't just throw these attributes together and hope that Euclidean distance on the vector will work. It won't.
K-means is only appropriate for Euclidean distance. It relies on the means to minimize variance, otherwise it may not converge. Plus, it doesn't work well with many attributes (dimensions). But you might want to look at more modern methods than hierarchical clustering and k-means. Definitely choose an algorithm/implementation that can work with arbitrary distance functions, as you probably will need to spend a lot of time on fine-tuning your similarity measure.
A common approach (for numerical data) is to use the z-scores of all attributes and then Euclidean. But there are just so many situations one can come up where this is nothing but a crude heuristic. You really need to consider how to measure "habitat similarity". The clustering algorithm needs this as "input", it does not infer this automagically, because it cannot.
An even simpler approach is to rescale all attributes by $\frac{a - a_\min}{a_\max - a_\min}$ to get them into the unit interval $[0:1]$. Then again, use Euclidean distance. Gower's similarity coefficient is along these lines (but with Manhattan distance).
Essentially, both of these methods try to weight attributes equally (with different notions of what "equal" means). It is a reasonable heuristic if you do not know what the attributes denote or how they scale. But assuming that you have attributes which scale exponentially or logarithmically (say, "volume" vs. "length"), this heuristic will perform bad.
One way to assign a weight to a variable is by changing its scale. The trick works for the clustering algorithms you mention, viz. k-means, weighted-average linkage and average-linkage.
Kaufman, Leonard, and Peter J. Rousseeuw. "Finding groups in data: An introduction to cluster analysis." (2005) - page 11:
The choice of measurement units gives rise to relative weights of the
variables. Expressing a variable in smaller units will lead to a
larger range for that variable, which will then have a large effect on
the resulting structure. On the other hand, by standardizing one
attempts to give all variables an equal weight, in the hope of
achieving objectivity. As such, it may be used by a practitioner who
possesses no prior knowledge. However, it may well be that some
variables are intrinsically more important than others in a particular
application, and then the assignment of weights should be based on
subject-matter knowledge (see, e.g., Abrahamowicz, 1985).
On the other hand, there have been attempts to devise clustering
techniques that are independent of the scale of the variables
(Friedman and Rubin, 1967). The proposal of Hardy and Rasson (1982) is
to search for a partition that minimizes the total volume of the
convex hulls of the clusters. In principle such a method is invariant
with respect to linear transformations of the data, but unfortunately
no algorithm exists for its implementation (except for an
approximation that is restricted to two dimensions). Therefore, the
dilemma of standardization appears unavoidable at present and the
programs described in this book leave the choice up to the user
Abrahamowicz, M. (1985), The use of non-numerical a pnon information for
measuring dissimilarities, paper presented at the Fourth European Meeting of
the Psychometric Society and the Classification Societies, 2-5 July, Cambridge
(UK).
Friedman, H. P., and Rubin, J. (1967), On some invariant criteria for grouping data.
J . Amer. Statist. ASSOC6.,2 , 1159-1178.
Hardy, A., and Rasson, J. P. (1982), Une nouvelle approche des problemes de
classification automatique, Statist. Anal. Donnies, 7, 41-56.
Best Answer
Some simple and obviuos, universal considerations for multivariate analysis, including clustering.
Case 1. Incomparable units. Height vs weight. You cannot compare, so the default decision is to standardize (equalize variances); it is "default" on the grounds of thought parsimony: "every unique aspect of nature is assumed to have same, unit variability of observations".
Case 2. Same units, irrelative features. Height vs circumference. These are clearly independent (conceptually, not statistically) phenomena of reality. Their same-unitness seems a coincidence. It would be silly to compare between the two values. The default decision is to standardize the features.
Case 3. Same units, juxtaposed features. Length of right arm vs of left arm. We could naturally compare the two lengths if we need so, they two are interchangeable, in a sense. The default decision is to leave variances as is (no matter how much they differ). Because "leave nature under study be how it is".
Case 4. Undecided whether 2 or 3. Length of arm vs length of leg. We could compare these but we are not interested in that, rather, we prefer to see the lengths as separate dimensions (albeit not irrelative phenomena). Feature-conceptual decision (whether standardize or leave) is impossible. Other, method-driven or goal-driven or criterial-driven$^1$ considerations would dictate the choice in a concrete situation. No default solution and the decision could be difficult to make. Some considerations might resolve the problem by providing an insight that the case is actually 2 or 3.
$^1$ By criterial-driven considerations I mean those engaged with a criterion, a meta-valuer which or who defines what value is "big" enough to be treated as opposite to "small" one. For example, in psychiatry the criterion is clinical populations and it is quite natural to standardize "psychopathological" features; in psychology the criterion is often a leading feature or a set of those, so standardizing, when not necessary, will just ruin inferences.