Solved – How to plot a 5D data set in “star coordinates”

data visualizationself-study

I am reading the paper "Star Coordinates: A Multidimensional Visualization Technique with Uniform Treatment of Dimensions" and trying to plot my data.

Let's say I have $A(2,5,3,1,8)$, a five dimensional data point, and points are calculated by the formula explained in the paper.

The basic idea of Star Coordinates is to arrange the
coordinate axes on a circle on a two-dimensional plane
with equal (initially) angles between the axes with an
origin at the center of the circle (Figure 1). Initially, all
axes have the same length. Data points are scaled to the
length of the axis, with the minimum mapping to the
origin and the maximum to the other end of the axis.
Unit vectors are calculated accordingly. …

This is simply an extension of typical 2d and 3d
scatter-plots to higher dimensions with normalization.

I have hard time grasping the idea. How do I plot it? The main problem is I could not understand the formula in the paper.

Best Answer

The "star coordinates" are intended to be modified interactively, beginning with a default. This answer shows how to create the default; the interactive modifications are a programming detail.

The data are considered a collection of vectors $x_j = (x_{j1}, x_{j2}, \ldots, x_{jd})$ in $\mathbb{R}^d$. These are first normalized separately within each coordinate, linearly transforming the data $\{x_{ji}, j=1, 2, \ldots\}$ into the interval $[0,1]$. This is done, of course, by first subtracting their minimum from each element and dividing by the range. Call the normalized data $z_j$.

The usual basis of $\mathbb{R}^d$ is the set of vectors $e_i = (0, 0, \ldots, 0, 1, 0, 0, \ldots, 0)$ having a single $1$ in the $i^\text{th}$ place. In terms of this basis, $z_j = z_{j1}e_1 + z_{j2}e_2 + \cdots + z_{jd}e_d$. A "star coordinates projection" chooses a set of distinct unit vectors $\{u_i, i=1, 2, \ldots, d\}$ in $\mathbb{R}^2$ and maps $e_i$ to $u_i$. This defines a linear transformation from $\mathbb{R}^d$ to $\mathbb{R}^2$. This map is applied to the $z_j$--it is just a matrix multiplication--to create a two-dimensional point cloud, depicted as a scatterplot. The unit vectors $u_i$ are drawn and labeled for reference.

(An interactive version will allow the user to rotate each of the $u_i$ individually.)


To illustrate this, here is an R implementation applied to a dataset of automobile performance characteristics. First let's obtain the data:

library(MASS)
x <- subset(Cars93, 
       select=c(Price, MPG.city, Horsepower, Fuel.tank.capacity, Turn.circle))

The initial step is to normalize the data:

x.range <- apply(x, 2, range)
z <- t((t(x) - x.range[1,]) / (x.range[2,] - x.range[1,]))

As a default, let's create $d$ equally spaced unit vectors for the $u_i$. These determine the projection prj which is applied to $z$:

d <- dim(z)[2] # Dimensions
prj <- t(sapply((1:d)/d, function(i) c(cos(2*pi*i), sin(2*pi*i))))
star <- z %*% prj

That's it--we are all ready to plot. It is initialized to provide room for the data points, the coordinate axes, and their labels:

plot(rbind(apply(star, 2, range), apply(prj*1.25, 2, range)), 
     type="n", bty="n", xaxt="n", yaxt="n",
     main="Cars 93", xlab="", ylab="")

Here is the plot itself, with one line for each element: axes, labels, and points:

tmp <- apply(prj, 1, function(v) lines(rbind(c(0,0), v)))
text(prj * 1.1, labels=colnames(z), cex=0.8, col="Gray")
points(star, pch=19, col="Red"); points(star, col="0x200000")

Star plot


To understand this plot, it might help to compare it to a traditional method, the scatterplot matrix:

pairs(x)

Scatterplot matrix


A correlation-based principal components analysis (PCA) creates almost the same result.

(pca <- princomp(x, cor=TRUE))
pca$loadings[,1]
biplot(pca, choices=2:3)

The output for the first command is

Standard deviations:
   Comp.1    Comp.2    Comp.3    Comp.4    Comp.5 
1.8999932 0.8304711 0.5750447 0.4399687 0.4196363 

Most of the variance is accounted for by the first component (1.9 versus 0.83 and less). The loadings onto this component are almost equal in size, as shown by the output to the second command:

     Price           MPG.city         Horsepower Fuel.tank.capacity        Turn.circle 
 0.4202798         -0.4668682          0.4640081          0.4758205          0.4045867 

This suggests--in this case--that the default star coordinates plot is projecting along the first principal component and therefore is showing, essentially, some two-dimensional combination of the second through fifth PCs. Its value compared to the PCA results (or a related factor analysis) is therefore questionable; the principal merit may be in the proposed interactivity.

Although R's default biplot looks awful, here it is for comparison. To make it match the star coordinates plot better, you would need to permute the $u_i$ to agree with the sequence of axes shown in this biplot.

Biplot