Your data seems to be of the form $u=f(x,y,z,t)$, i.e.,
a time series for each point in space, where the space coordinates
are window size, number of windows and offset.
This can either be seen as a 4-dimensional array (the function $f$)
or a set of points $(x,y,z,t,u)$ in a 5-dimensional space
(the graph of $f$).
Here are a few ideas to visualize high-dimensional datasets.
The "grand tour" (available in applications such as ggobi)
is an animation that shows the cloud of points rotating in space,
i.e., several, more or less random, projections of the 5-dimensional space into the plane.
Since, for this dataset, the first four coordinates $(x,y,z,t)$ are arranged in a grid, you would just see that grid.
Parallel coordinate plots and general dimensions reduction methods (PCA, MDS) are likely to present the same problems, because of the presence of the grid: the data really is 4-dimensional.
You may be able to adapt some of the plots
described in J. Klemela's book, Smoothing of Multivariate Data
(they are designed for densities,
but should also work for functions defined on a grid, as here),
but they are not very standard,
and understanding what they actually mean takes a long, long time.
You could slice the data: take points $(x,y,z)$ at random and plot the corresponding time series:
you may be able to group them into different patterns (some could be increasing,
others decreasing, others present a bump, some could be noisy, some could be smooth,
etc.), either manually, or using some clustering algorithm.
You could aggregate the data in the time dimension:
for each point $(x,y,z)$, you could compute some "metrics" of the corresponding time series,
e.g., maximum, minimum, average, range, absolute variation, etc. Each of those could be visualized
as a 3-dimensional contour plot.
You could aggregate the data in the space dimensions:
plot $\sum_{x,y,z} f(x,y,z,t)$ versus $t$ (a single curve)
or $\sum_{x,y} f(x,y,z,t)$ versus $t$ for all values of $z$
(many curves, either on the same plot or on different plots). You could replace the sum with the averge, the median, the standard deviation, etc.
3D-scatterplots are sometimes a bit confusing, especially if you can't rotate the plot around. However, the scatterplot matrix supports the interpretation, at least here, rather nicely, even if it is missing the colors.
As a.desantos already pointed out, the individual scatterplots in the second image are projections on different planes. If you think how the points in the 3D-scatterplot have to be located in order to give these projections, it maybe becomes clearer.
The projection to the plane x1, x3 would look roughly like this (can't get the image to load up, colors are marked with letters as follows: r=red, g=green, y=yellow, b=blue):
X3 |y y y y y b b b b b
|y y y y y b b b b b
|y y y y y b b b b b
|y y y y y b b b b b
|y y y y y b b b b b
|r r r r r g g g g g
|r r r r r g g g g g
|r r r r r g g g g g
|r r r r r g g g g g
|r r r r r g g g g g
+-------------------
X1
Best Answer
I think what primarily needs to be added to your list is coplots, but let's work our way up to that. The starting point for visualizing two continuous variables should always be a scatterplot. With more than two variables, that generalizes naturally to a scatterplot matrix (although if you have lots of variables, you may need to break that up into multiple matrices, see: How to extract information from a scatterplot matrix when you have large N, discrete data, & many variables?). The thing to recognize is that a scatterplot matrix is a set of 2D marginal projections from a higher-dimensional space. But those margins may not be the most interesting or informative. Exactly which margins you might want to look at is a tricky question (cf., projection pursuit), but the simplest possible next set to examine is the set that makes the variables orthogonal, i.e., scatterplots of the variables that result from a principal components analysis. You mention using this for data reduction and looking at the scatterplot of the first two principal components. The thinking behind that is reasonable, but you don't have to only look at the first two, others might be worth exploring (cf., Examples of PCA where PCs with low variance are “useful”), so you can / should make a scatterplot matrix of those, too. Another possibility with the output of a PCA is to make a biplot, which overlays the way the original variables are related to the principal components (as arrows) on top of the scatterplot. You could also combine a scatterplot matrix of the principal components with biplots.
All of the above are marginal, as I mentioned. A coplot is conditional (the top part of my answer here contrasts conditional vs. marginal). Literally, 'coplot' is a blended word from 'conditional plot'. In a coplot, you are taking slices (or subsets) of the data on the other dimensions and plotting the data in those subsets in a series of scatterplots. Once you learn how to read them, they are a nice addition to your set of options for exploring patterns in higher-dimensional data.
To illustrate these ideas, here is an example with the RandU dataset (pseudorandom data generated by an algorithm that was popular in the 1970's):