Similarly to the previous answers, most of the following answer of mine is not specific to SAS, as I use R. However, there is one exception to that - please see below. It seems that there exist substantial research efforts towards development of clustering algorithms for mixed data. More specifically, some algorithms were developed and/or adapted with a focus on categorical data.
In particular, some adaptations of traditional k-means clustering approach include k-modes, fuzzy k-modes, k-histograms and k-populations (for example, see this paper). Other solutions to the problem include hierarchical clustering, including ROCK, CACTUS and others. Probability-based clustering approaches for categorical data include already mentioned Two-Step cluster analysis procedure (seems to be SPSS-specific).
Recently some other streams of research, related to the topic, have appeared. They include using such approaches, as neural networks and genetic algorithms (for examples, comparisons and references, see this paper and this paper), information theory (for example, see this paper and this paper). An interest to the model-based clustering, specifically based on latent class analysis, is also growing (for example, see this paper and this paper - latent tree models - seems to be a mix of latent-based and hierarchical approaches).
Speaking of latent class analysis (LCA), finally, I would like to share the promised SAS-specific relevant information. This paper describes a LCA-based approach, called latent class clustering, and its implementation, using a free SAS add-in, which is available for download on this page.
You can think of this as a sparse signal approximation problem. First, let's just consider the case for a single signal.
Say we want to approximate a signal $x$. We have a signal dictionary $D$, which is a matrix where each column is a basis function. In your case, the basis functions would be sinusoids. We can approximate $x$ as a linear combination of basis functions: $\hat{x} = D w$, where $w$ is a weight vector. We can additionally impose the constraint that $w$ be sparse. That is, it should contain few non-zero entries. We can measure the sparsity using the $l_0$ norm, $\|w\|_0$, which just counts the number of non-zero entries. Given the signal dictionary, we want to find the weights that give a good approximation of the signal:
$$\underset{w}{\min} \|x - Dw\|_2^2 \quad s.t. \|w\|_0 \le k$$
This tries to find a good least-squares approximation to $x$, subject to the constraint that we use no more than $k$ basis functions. There's a big literature on solving this problem. It turns out that solving it is NP-hard, but people have come up with many good approximations. One solution is to relax the $l_0$ norm to the $l_1$ norm. This is called basis pursuit denoising in the signal processing literature and LASSO in the statistics literature, and has proven very successful. Another class of solutions is to keep the $l_0$ norm, but use a heuristic search strategy that finds a locally optimal solution instead of the global minimum (or that finds the global minimum under certain conditions). One such method is called Orthogonal Matching Pursuit.
Your case has an interesting twist, in that there are multiple signals to approximate, and you want to impose the condition that the sparsity pattern is the same for all signals (i.e. they all use the same set of basis functions). Also, the sparsity pattern is not known ahead of time.
Writing out the problem, let's replace the signal vector with a signal matrix $X$, where each column is a signal. Replace the weight vector with a weight matrix $W$, where column $i$ gives the weights for signal $i$. Replace the $l_2$ norm with the Frobenius norm, which is just the analog for matrices.
$$\underset{W}{\min} \|X - DW\|_F^2 \quad$$
Subject to the constraint that $W$ contains no more than $k$ nonzero rows.
This finds a least squares approximation to the entire set of signals, using a small set of basis functions (no more than $k$ of them) that are shared across all signals. Of course, the caveat about NP-completeness still applies, so we'll have to settle for approximation algorithms.
One method that solves this problem is called Simultaneous Orthogonal Matching Pursuit:
Gilbert and Strauss (2005). Simultaneous sparse approximation via
greedy pursuit
http://authors.library.caltech.edu/9043/1/TROicassp05.pdf
There are probably others too.
Best Answer
Interesting. I've been working on a similar question recently. First, as is well known, the full periodogram isn't needed since the Nyquist frequency is N/2. Then, if you rank the set by decreasing Amplitude, you could select that subset of frequencies that best represents the periodicities. I'm not aware of any theoretical motivations for selecting a cutoff...but a judgement-based heuristic would work.
My approach was a little different, expanding on the FFT. Obviously, there is likely to be some autocorrelation to the time series as well. Why not fold in a measure to capture that, e.g., the DW or related metrics? Then, there can be issues with trend, cointegration and unit roots...why not use measures such as the Augmented Dickey-Fuller or Philips-Perron to express those relationships? Then, Rob Hyndman put out a paper on clustering time series a couple of years ago that decomposed the time series into the moments of the distribution, and clustering on those metrics.
Anyway, those are some thoughts...