Solved – Automatically detecting sudden change of mean

change pointpythontime series

Take a look at this photo:

data

It depicts a box plot of series of identical runs for successive i values. (AFAIK it's the standard Min/Max and 1rst, 2nd, 3rd quartiles.) So the x-axis of 1 represents 1000 runs where i=1; and the second plot shows 1000 runs where i=2; and so on.

It's easy to eye-ball and see that there's a split between the i=1,2 and i=3-19. The values for i=2 are on 'average' larger, by a little bit.

What I aim to do is given the input that produced this graph, programmatically find that split (between 2 and 3) where there's the sudden consistent change. (Step 1) It was also be awesome if there was some sort of confidence score to go along with it – just for user feedback. The change may be up or down, but I know that on both sides of the split the values will be consistent (just like for i > 2 the box plots stay pretty even and don't return to i<2 values).

Then, after that, I want to take a measurement for an unknown i and decide which 'side' of the split it falls on. Now I know I could never know that answer conclusively from a single measurement, so I plan to take several (5? 50? 100?) measurements for this unknown-but-unchanging i value. Then using those measurements know which side of the split the i falls on (Step 2). Again, it would be awesome if there was a confidence value associated with this decision.

I'm working in python so if there's a library awesome, but I'm cool with implementing an algorithm/equation myself. What's the techniques/equations/papers I should read up on to learn how to do this?

Best Answer

If I understand you you correctly, you might need to learn about multiple comparisons:

http://en.wikipedia.org/wiki/Multiple_comparisons

The choice of a particular procedure is a different question, e.g., Scheffe vs. Tukey vs. Bonferroni.

At least in this framework, there is a clear and straightforward way to have hypothesis testing as well as confidence interval estimation.