Solved – Moving average of irregular time series data using R

rtime seriesunevenly-spaced-time-series

I'm attempting to dig out some metrics that look at how reliably clients connect to a service.

The raw data is in the form of "client A, came online|offline at time X". The connection is highly unreliable, and I want some type of moving average to show whether the connection is improving or not over time. Clients are not always connected, so simply going offline does not mean it's a fault.

So far, I've taken then data and applied some assumptions to help simplify it, I assume that if a client reconnects within a minute of disconnecting then that is a fault. These I've modelled as a simple impluses, ie. "client A had fault at time X".

The part I'm struggling with is how to turn this plot into a moving average (I'm playing with R to crunch the numbers).

I believe I should be able to do this with a low pass filter, or use the zoo package and rollmean. However, I don't know how to handle the cases where the client simply didn't want to be online.

Any suggestions?

Best Answer

If the problem is just plotting the data (i.e., if you have already identified the faults), you can simply write a function that counts the events in the past hour, as follows.

# Sample data, with two peaks
library(zoo)
seconds <- 60*60*24 # One day
lambda <- function(x) ( 1 + sin(2*pi*x/seconds) ^ 2 ) / 2
n <- 100
x <- seconds*sort(runif(n))
x <- x[runif(n) < lambda(x)]
x <- zoo( 0*x+1, x )
plot( index(x), x, type="h" ) 

# Moving average over the past hour
f <- function(u) sum(u - 3600 < index(x) & index(x) <= u )
f <- Vectorize(f)
curve(f(x), xlim=c(0,seconds))

# Exponentially-weighted average, half-life = 1 hour
tau <- 3600
f <- function(u) sum( ifelse(index(x)<=u,exp((index(x)-u)/tau),0) )
f <- Vectorize(f)
curve(f(x), xlim=c(0,seconds))

But this will be inefficient if there is a lot of data. Instead, you can consider two types of events: "a fault occurred" (i.e., enters our window of observation) and "a fault occurred one hour ago" (i.e., leaves our window of observation). Associate the values +1 and -1 to those events, and compute the cumulated sum: that is the number of faults in the previous hour.

# More efficient moving average
y <- merge(x, zoo(-x, index(x) + 3600), fill=0)
plot( cumsum(y[,1] + y[,2]) )