Solved – How to interpret pairs plot in R

correlationpaired-datar

I am a beginner in plotting/graphing. Kindly explain how to interpret the pairwise scatter plots generated using pairs() function in R.
The data contains 323 columns of different indicators of a disease. Although I see that many columns are mean, std, slope, min, max and so on of any one parameter. For example, for an attribute like 'walking', there are other attributes like: sum.slope.walking, meansquares.slope.walking, sd.slope.walking and so on.
Is it okay to select any one parameter in such a case (such as meansquares.slope..) ?

What are the patterns to look out for to identify relationships between attributes ?

Best Answer

If you have a number of different measurements in your data.frame, then pairs will show scatterplots of between all pairs of these measures.

Example data:

x <- rnorm(100)
obs <- data.frame(a = x, 
                  b = rnorm(100), 
                  c = x + runif(100, .5, 1),
                  d = jitter(x^2))

pairs(obs)

This is a data.frame with four different measures called a, b, c and d on 100 individuals. pairs draws this plot:

pairs plot for example data

In the first line you see a scatter plot of a and b, then one of a and c and then one of a and d. In the second row b and a (symmetric to the first), b and c and b and d and so on.

pairs does not compute sums or mean squares or whatever. If you find that in your pairs plot, then that is in your dataframe.

What patterns to look for? In my example you find no pattern between a and b, a linear pattern between a and cand a curved, non-linear pattern between a and d. Look for patterns that might be of interest to your statistical questions.

Please note, that whilst asking for the interpretation of a plot is a statistical question, questions on how to use R alone are not on topic on Cross Validated. You should ask questions on R programming on Stack Overflow.