The following proposition is surely perfectible:
zucchini <- function(st, en, mingap=1)
{
i <- order(st, en-st);
st <- st[i];
en <- en[i];
last <- r <- 1
while( sum( ok <- (st > (en[last] + mingap)) ) > 0 )
{
last <- which(ok)[1];
r <- append(r, last);
}
if( length(r) == length(st) )
return( list(c = list(st[r], en[r]), n = 1 ));
ne <- zucchini( st[-r], en[-r]);
return(list( c = c(list(st[r], en[r]), ne$c), n = ne$n+1));
}
coliflore <- function(st, en, mingap = 1)
{
zu <- zucchini(st, en, mingap);
plot.new();
plot.window( xlim=c(min(st), max(en)), ylim = c(0, zu$n+1));
box(); axis(1);
for(i in seq(1, 2*zu$n, 2))
{
x1 <- zu$c[[i]];
x2 <- zu$c[[i+1]];
for(j in 1:length(x1))
rect( x1[j], (i+1)/2, x2[j], (i+1)/2+0.5, col="gray", border=NA );
}
}
Application:
> st <- runif(20,0,50)
> en <- st + runif(20, 5,20)
> st
[1] 25.571385 17.074676 4.564936 27.247745 23.832638 11.045469 2.845222
[8] 2.824046 23.319625 19.684993 42.610242 48.185618 47.748637 39.813871
[15] 9.235512 40.299425 13.797027 21.079956 31.638772 24.152991
> en
[1] 35.43667 32.20029 19.37133 44.30378 35.73845 16.63794 11.52551 16.06469
[9] 32.22477 26.05563 49.51284 67.77664 67.27914 49.35472 28.27657 50.49421
[17] 27.29273 37.87611 48.76251 39.89335
> coliflore(st, en)
Happy new year!
After you've fit the model, why not use predicted defects as a variable to compare to the others using whatever standard techniques are meaningful to them? It has the advantage of being a continuous variable so you can see even small differences. For example, people will understand the difference between an expected number of defects of 1.4 and of 0.6 even though they both round to one.
For an example of how the predicted value depends on two variables you could do a contour plot of time v. complexity as the two axes and colour and contours to show the predicted defects; and superimpose the actual data points on top.
The plot below needs some polishing and a legend but might be a starting point.
An alternative is the added variable plot or partial regression plot, more familiar from a traditional Gaussian response regression. These are implemented in the car library. Effectively the show the relationship between what is left of the response and what is left of one of the explanatory variables, after the rest of the explanatory variables have had their contribution to both the response and explanatory variables removed. In my experience most non-statistical audiences find these a bit difficult to appreciate (could by my poor explanations, of course).
#--------------------------------------------------------------------
# Simulate some data
n<-200
time <- rexp(n,.01)
complexity <- sample(1:5, n, prob=c(.1,.25,.35,.2,.1), replace=TRUE)
trueMod <- exp(-1 + time*.005 + complexity*.1 + complexity^2*.05)
defects <- rpois(n, trueMod)
cbind(trueMod, defects)
#----------------------------------------------------------------------
# Fit model
model <- glm(defects~time + poly(complexity,2), family=poisson)
# all sorts of diagnostic checks should be done here - not shown
#---------------------------------------------------------------------
# Two variables at once in a contour plot
# create grid
gridded <- data.frame(
time=seq(from=0, to=max(time)*1.1, length.out=100),
complexity=seq(from=0, to=max(complexity)*1.1, length.out=100))
# create predicted values (on the original scale)
yhat <- predict(model, newdata=expand.grid(gridded), type="response")
# draw plot
image(gridded$time, gridded$complexity, matrix(yhat,nrow=100, byrow=FALSE),
xlab="Time", ylab="Complexity", main="Predicted average number of defects shown as colour and contours\n(actual data shown as circles)")
contour(gridded$time, gridded$complexity, matrix(yhat,nrow=100, byrow=FALSE), add=TRUE, levels=c(1,2,4,8,15,20,30,40,50,60,70,80,100))
# Add the original data
symbols(time, complexity, circles=sqrt(defects), add=T, inches=.5)
#--------------------------------------------------------------------
# added variable plots
library(car)
avPlots(model, layout=c(1,3))
Best Answer
Sheesh, this is a good one.
I think that you're on the right path using a "infographic" approach however I would suggest looking at data visualization infographics would be better.
Something like http://www.dipity.com/ could allow you to track time and also provide content with the subject being reffrenced.
Take a look at http://many-eyes.com/#/visualizations and maybe somthing here could allow you to better evaluate your data. I only suggest these sites to help you find a better way of approaching such a fun and exciting project.