R Data Visualization – How to Extract Information from a Scatterplot Matrix with Large N, Discrete Data, & Many Variables?

data visualizationinterpretationrscatterplot

I'm playing around with the breast cancer dataset and created a scatterplot of all attributes to get an idea for which ones have the most effect on predicting the class malignant (blue) of benign (red).

I understand that the row represents x axis and column represents y axis but I can't see what observations I can make about the data or the attributes from this scatterplot.

I'm looking for some help to interpret/make observations about the data from this scatterplot or if I should be using some other visualization to visualize this data.

enter image description here

R code I used

link   <- "http://www.cs.iastate.edu/~cs573x/labs/lab1/breast-cancer-wisconsin.arff"
breast <- read.arff(link)
cols   <- character(nrow(breast))
cols[] <- "black"
cols[breast$class == 2] <- "red"
cols[breast$class == 4] <- "blue"
pairs(breast, col=cols)

Best Answer

I'm not sure if this is of any help for you, but for primary EDA I really like the tabplot package. Gives you a good sense of what possible correlations there may be within your data.

install.packages("tabplot")
tableplot(breast) # gives you the unsorted image below
tableplot(breast, sortCol="class") # gives you a sorted image according to class

unordered plot ordered plot