Solved – How to interpret a mosaic plot in R

data visualizationinferenceinterpretationr

The mosaic plot shown bellow is the product of this code

mosaicplot(table(survey$Air,survey$Satisfaction),
  main = "Satisfaction with Airconditioning and Overall Satisfaction")

Bear in mind that the satisfactions were coded as binary, where levels of satisfaction of 1,2,3 are coded as 0, and levels of satisfaction of 4,5 were coded as 1. I do not really know how to interpret the mosaic plot. Any help is appreciated!

Best Answer

A mosaic plot helps to visualize the statistical association between two categorical variables, in your case between having AC (0, 1) and high satisfaction (0, 1).

The frequency distribution of the first variable (AC) is represented on the horizontal axis. The relative frequency of each factor level is proportional to the width of the corresponding segment on the x-axis. In your case, most observations are 0, so no AC was available. In pseudo-code, you would obtain the corresponding values by prop.table(table(AC))
The joint frequency distribution (the relative frequencies of each combination of the two factors) is represented by the areas of the corresponding rectangles. In your case, there are many dissatisfied persons without AC. You would get these relative frequencies by table(AC, satisfaction).
Within each level of the first variable, the frequency distribution of the second variable (satisfaction) is shown vertically. If these conditional distributions look similar, there is no or only a weak association between the two factors. If they look very different like in your case, you would speak of a clear association. In the no-AC group (the left vertical bar), a clear minority (about 10%) is satisfied. In the AC group (the right vertical bar), more than half of the persons were satisfied. In pseudo.code, you would get the corresponding proportions by prop.table(table(AC, satisfaction), margin = 1).

Typically, the third information is usually the reason to look at a mosaic plot. The order of the variables is essential.

Related Solutions

R – Alternative to Sieve and Mosaic Plots for Contingency Tables

The book you described sounds like, 'Visualizing Categorical Data,' Michael Friendly. The plot described in the 1st chapter that seems to match your request was described as a type of conceptual model for visualizing contingency table data (loosely described by the author as a dynamic pressure model with observational density), and can be seen in the google preview for Ch 1. The book is geared towards SAS users.

A paper on the topic is referenced here: www.datavis.ca/papers/koln/kolnpapr.pdf

'Conceptual Models for Visualizing Contingency Table Data,' Michael Friendly .

enter image description here

*incidentally, the author is also listed as one of the authors of the vcd package (as it was specifically inspired by his book mentioned above) -- maybe you could ask him directly if there's a simple modification to one of the built in functions that's not readily apparent.

** The coloring scheme seems to relate the color blue with positive deviations from independence, and red for negative deviations. Although the red scheme makes sense in that context, maybe it would have been more apt to have used green to represent positive deviations.

http://www.datavis.ca/papers/asa92.html

Solved – How to distance and to MDS-plot objects according their complex shape

This may be only a partial answer because I don't think the plot that you expect is really what is in the data, especially the "parallelity and continuity" of the intermediate signals. I will speculate on reasons for that below.

But I think I was able to get to what you look for in terms of the four basal signals A1, A5, E1, E5. Namely that they lie on the edge of the embedded manifold, that opposite signals lie more or less diametrical to each other (that is A1, E5 and E1, A5 respectively) and that neighbouring signals are preserved (so A1, A5 and E1, E5 respectively).

Generally, I think standard (i.e. with an input weight matrix consiting of only 1's) MDS doesn't really give you what you want because you are actually looking for nonlinear dimension reduction that has some localization feature, namely that larger "distances" should not be preserved but local distances should. There are a number of algorithms that do that. A rather popular one that is often used for cases like yours is called t-SNE for t-Distributed Stochastic Neighbor Embedding. On the linked homepage you will find quite some information.

I calculated the t-SNE embedding for your kappa distance. Note that I use a high perplexity to force a shaping that looks like the one you aim for. It can be obtained by

set.seed(1)
library(tsne)
tsne.coor1 <- tsne(res,perplexity=25)
rownames(tsne.coor1) <- c("A1","A2","A3","A4","A5",
                   "B1","B2","B3","B4","B5",
                   "C1","C2","C3","C4","C5",
                   "D1","D2","D3","D4","D5",
                   "E1","E2","E3","E4","E5")
plot(tsne.coor1[,1], tsne.coor1[,2], type="n", xlab="", ylab="")
text(tsne.coor1[,1], tsne.coor1[,2],
     labels=row.names(tsne.coor1), cex=0.8)
abline(h=0,v=0,col="gray75")

and looks like this

t-SNE embedding

The basic signals are labeled in red. As you can see, the similarity structure as captured by your preprocessing and distance measure does not suggest that the intermediate signals are actually at all as you expect them to be. For example, the path from A1 to E1 has intermediate signals E4, D5, A3 and not B1 through D1. But this is what your data tell you! So, the similarity between the signals that are within the convex hull of the embedded manifold suggests that the clear-cut pattern is not preserved.

There are two obvious explanations:

Some information gets lost in mapping to low-D. The information that gets lost might actually be the information you were looking for.
The distance measure might not capture what you actually care for. I agree with @ttnphns on this one.

[\Edit] (thanks @ttnphns!) To investigate the last point, I tried other measures of similarity for binary matrices. For this high perplexity it led to no discernably different results in the shape, but it did so in the arrangements (I used a gravity model similarity and an asymmetric binary similarity, the one returned by dist(x,method="binary")). For lower perplexity the effects on the visualisation the kappa distance is small, but for the other distances it is not. To illustrate for the asymmetric distances the results are:

res <- dist(M,method="binary")
set.seed(1)
tsne.coor2 <- tsne(res,perplexity=25)
tsne.coor3 <- tsne(res,perplexity=3)
rownames(tsne.coor2)<-rownames(tsne.coor3) <- c("A1","A2","A3","A4","A5",
                   "B1","B2","B3","B4","B5",
                   "C1","C2","C3","C4","C5",
                   "D1","D2","D3","D4","D5",
                   "E1","E2","E3","E4","E5")
par(mfrow=c(1,2))
plot(tsne.coor2[,1], tsne.coor2[,2], type="n", xlab="", ylab="")
text(tsne.coor2[,1], tsne.coor2[,2],
     labels=row.names(tsne.coor3), cex=0.8)
abline(h=0,v=0,col="gray75")
plot(tsne.coor3[,1], tsne.coor3[,2], type="n", xlab="", ylab="")
text(tsne.coor3[,1], tsne.coor3[,2],
     labels=row.names(tsne.coor2), cex=0.8)
abline(h=0,v=0,col="gray75")

and here's the results

different distance matrix t-SNE

So, for high perplexity the difference in the shape of the projection of the signals is rather similar for both distance measures. For lower perplexity, the results change. Note that while it is looking different than what you originally intended when using the asymmetric binary distance, the transition from basis signal through the other states seems to be better preserved in the low perplexity plot, particularly column-wise! . This makes the speculation 2 of using kappa being a not very suitable distance measure more likely. You may try the distances dist(x, method="binary") or cluster::daisy(x, metric= "gower")) (which is the dice coefficient, I think).

[\end Edit]

Note the t-SNE has random initializations, so it might look a bit different at your end --- not sure whether set.seed has an effect in the R implementation. The random initialization is actually something that might get you closer to what you need anyway. As the authors put it:

In contrast to, e.g., PCA, t-SNE has a non-convex objective function. The objective function is minimized using a gradient descent optimization that is initiated randomly. As a result, it is possible that different runs give you different solutions. Notice that it is perfectly fine to run t-SNE a number of times (with the same data and parameters), and to select the visualization with the lowest value of the objective function as your final visualization.

You may play around with the above mentioned distances and some of the parameters, especially perplexity, to perhaps get you closer to what you need into one direction or the other. Hope this helps as a first start.

Best Answer

Related Solutions

R – Alternative to Sieve and Mosaic Plots for Contingency Tables

Solved – How to distance and to MDS-plot objects according their complex shape

Related Question