It's not about not being able to compute something.
Distances much be used to measure something meaningful. This will fail much earlier with categorial data. If it ever works with more than one variable, that is...
If you have the attributes shoe size and body mass, Euclidean distance doesn't make much sense either. It's good when x,y,z are distances. Then Euclidean distance is the line of sight distance between the points.
Now if you dummy-encode variables, what meaning does this yield?
Plus, Euclidean distance doesn't make sense when your data is discrete.
If there only exist integer x and y values, Euclidean distance will still yield non-integer distances. They don't map back to the data. Similarly, for dummy-encoded variables, the distance will not map back to a quantity of dummy variables...
When you then plan to use e.g. k-means clustering, it isn't just about distances, but about computing the mean. But there is no reasonable mean on dummy-encoded variables, is there?
Finally, there is the curse of dimensionality. Euclidean distance is known to degrade when you increase the number of variables. Adding dummy-encoded variables means you lose distance contrast quite fast. Everything is as similar as everything else, because a single dummy variable can make all the difference.
Similarly to the previous answers, most of the following answer of mine is not specific to SAS, as I use R. However, there is one exception to that - please see below. It seems that there exist substantial research efforts towards development of clustering algorithms for mixed data. More specifically, some algorithms were developed and/or adapted with a focus on categorical data.
In particular, some adaptations of traditional k-means clustering approach include k-modes, fuzzy k-modes, k-histograms and k-populations (for example, see this paper). Other solutions to the problem include hierarchical clustering, including ROCK, CACTUS and others. Probability-based clustering approaches for categorical data include already mentioned Two-Step cluster analysis procedure (seems to be SPSS-specific).
Recently some other streams of research, related to the topic, have appeared. They include using such approaches, as neural networks and genetic algorithms (for examples, comparisons and references, see this paper and this paper), information theory (for example, see this paper and this paper). An interest to the model-based clustering, specifically based on latent class analysis, is also growing (for example, see this paper and this paper - latent tree models - seems to be a mix of latent-based and hierarchical approaches).
Speaking of latent class analysis (LCA), finally, I would like to share the promised SAS-specific relevant information. This paper describes a LCA-based approach, called latent class clustering, and its implementation, using a free SAS add-in, which is available for download on this page.
Best Answer
Transforming your data by subtracting the minimum from every value and dividing the differences by the range is often called normalizing. The transformed data will lie within the interval $[0, 1]$.
It is common to normalize all your variables before clustering. The fact that you are using complete linkage vs. any other linkage, or hierarchical clustering vs. a different algorithm (e.g., k-means) isn't relevant. The reason is that clustering algorithms all use a distance measure of some sort to determine if object $i$ is more likely to belong to the same cluster as object $j$ than the same cluster as object $k$. These distance measures are affected by the scale of the variables. That is, when computing the distance between two objects, each with a length and a mass, the distance will change dramatically if you change from, say, millimeters to kilometers. By putting all variables into the same range, you weight the variables equally.
You don't have to normalize your variables though. It just means that how close objects are will be more reflective of their values on one variable than another. For instance, using your example data, the ranges are:
Thus, without normalizing, almost all of the computed distance between two objects will be due to their values on
Var1
.