I have a large data frame in the following form (I apologize for this formatting):
Site Season T SC pH Chl DO.S DO BGA Tur fDOM Flow Rainfall Solar Rain
300N Winter 14.05 1692.77 7.93 NA 82.26 8.42 NA 9.25 NA NA 0.00 219.18 no
If you can't understand the formatting, there are 12 numerical factors, and 3 categorical factors (Site
, Season
, Rain
[yes/no]). Each row represents the average daily values that I have calculated from 15-minute time series. I have spent a good amount of time doing data exploration (linear regression analysis, looking at time series plots for patterns), but haven't found a method that works for me yet. I have also worked with corrplot
, correlation matrices, and covariance functions in an arduous way, where I subset each categorical combination and found corrplot
s for each (I have also tried it with ddply
, but the resulting format is not in the correlation matrix format that is easy to plot). I have also attempted PCA on the data to little avail.
My question is first and foremost, does anyone have an idea for data visualization of this kind of dataset? The main question I am after is, "What are the factors that influence DO
(dissolved oxygen)?". How does this change by location (Site
), Season
, and with the influence of Rain
. I would really like a quick method for shooting out correlation matrices (or heat maps; I have tried both) for each categorical subset. I tried this with ggplot
and facet_wrap
, but it wasn't happening for me. I also tried ggpairs
from the GGally package, but honestly didn't spend too much time with that method.
I was starting to get into the idea of star graphs (on polar coordinates), which can be used to visualize repeating periodicity in time series, but am running out of time and decided to seek the advisement of Stack Overflow. I really appreciate any advice or thoughts on visualizing this data that come to your mind. I feel like some combination of ddply
and graphing is what I need, but I haven't gotten there yet.
Thank you for your time.
EDIT:
dput
of the data frame in question:
structure(list(Site = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("2100S",
"300N", "3300S", "800S", "Burnham", "Center"), class = "factor"),
Season = structure(c(4L, 4L, 4L, 4L, 2L, 2L), .Label = c("Fall",
"Spring", "Summer", "Winter"), class = "factor"), T = c(14.05,
14.18, 14.5, 14.58, 14.07, 11.91), SC = c(1692.77, 1671.31,
1680.71, 1661.79, 1549.56, 1039.63), pH = c(7.93, 7.92, 7.96,
7.95, 7.93, 7.79), Chl = c(NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_), DO.S = c(82.26, 78.79, 82.05,
80.92, 74.33, 73.96), DO = c(8.42, 8.04, 8.31, 8.18, 7.61,
7.97), BGA = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_), Tur = c(9.25, 9.77, 9.41, 10.6, 40.38, 50.25),
fDOM = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_), Flow = c(NA, 178.08, 178.53, 188.13, 306.15, 382.22
), Rainfall = c(0, 0, 0, 0, 0.01, 0.81), Solar = c(219.18,
228.33, 244.3, 247.69, 105.15, 220.73), Rain = structure(c(1L,
1L, 1L, 1L, 2L, 2L), .Label = c("no", "yes"), class = "factor")), .Names = c("Site",
"Season", "T", "SC", "pH", "Chl", "DO.S", "DO", "BGA", "Tur",
"fDOM", "Flow", "Rainfall", "Solar", "Rain"), row.names = c(NA,
6L), class = "data.frame")
Best Answer
Seems like kind of a tall order, but here's a whirlwind tour of R.
You may be interested in checking out the
rattle
package/GUI: it can get you off to a quick start with a lot these general questions.