Solved – How to plot 20 years of daily data in time series

data visualizationr

I have the following dataset: https://dl.dropbox.com/u/22681355/ORACLE.csv
and would like to plot the daily changes in 'Open' by 'Date', so I did the following:

oracle <- read.csv(file="http://dl.dropbox.com/u/22681355/ORACLE.csv", header=TRUE)
plot(oracle$Date, oracle$Open, type="l")

and I get the following:

enter image description here

Now this is obviously not the nicest plot ever, so I'm wondering what is the right method to use when plotting such detailed data?

Best Answer

The problem with your data is not that it is extremely detailed: you have no values at weekends, that's why it is plotted with gaps. There are two ways to deal with it:

  1. Either try to guess approximate values in weekends with some smoothing methods (smooth.spline, loess, etc.). Code of simple interpolation is below. But in this case you will introduce something "unnatural" and artificial to the data. That's why I prefer second option.
currentDate <- min(as.Date(oracle$Date))
dates <- c(currentDate)
openValues <- c(oracle$Open[5045])
i <- 5044
while (i > 0) {
  currentDate <- currentDate + 1;
  dates <- c(dates, currentDate)
  if (currentDate == as.Date(oracle$Date[i])) {
        # just copy value and move
        openValues <- c(openValues, oracle$Open[i])
        i <- i-1
      } else {
        # interpolate value
        openValues <- c(openValues, mean(oracle$Open[i:i-1]))
  }
}
plot(dates, openValues, type="l")
  1. You can go from daily basis to a weekly basis, just averaging (for example) five sequential points that belog to one week (in this case you are "killing" some information). Just a quick example of how to do that would be
openValues = c(mean(oracle$Open[1:5]));
dates = c(as.Date(oracle$Date[1]));
for (i in seq(6,5045,5)) {
  openValues = c(openValues, mean(oracle$Open[i:i+5]));
      dates = c(dates, as.Date(oracle$Date[i]));
}
plot(dates, openValues, type="l")

Hope it will help.