Hypothesis Testing – How to Test if Two Exponentially Distributed Datasets Are Different

exponential distributionhypothesis testing

I have 2 exponentially distributed datasets and I want to be sure that they are from different distributions.
Unfortunately a necessary error in the detection of the data forces me to discard all data under a certain threshold.
In each set I have about 3000 data points and plotting the data makes me think that the lambda value is different. Fitting also yields different values for lambda.

How can I be sure that both datasets originate from a different distribution?

Here a plot of how the sets look like (Note that all values under lifetime=3sec have to be discarded):

UPDATE: The above distributions are in both cases normalised over N just for comparing them better in a graph because the total number of data points N is different.

UPDATE2: After truncation I have about 150 lifetime values for the red dataset and 350 for the blue dataset. Turns out that 3000 was exaggerated (I am sorry).

UPDATE3: Thank you for bearing with me. Here is the raw data:

http://pastebin.com/raw.php?i=UaGZS0im

http://pastebin.com/raw.php?i=enjyW1uC

So far I fitted an exponential function to both datasets and compared the slopes. Since any normalisation should not change the slope of the data different slopes should imply different underlying exponential distributions (My experience with statistical analysis is very limited).

The values under the threshold are discarded because the measurement detects many events too often in that regime.

UPDATE4: I just realised that my problem is much more complicated than I thought. I have actually left censored (I do not know the beginning of some events) and right censored (don't know the end of some events) data AND I have to discard all lifetimes under 3s (truncation). Is there any way to incorporate all of that into one analysis? So far I found some help on how to work with censored data (survival analysis) but what should I do with the truncation?

Best Answer

Exponentially distributed lifetimes are an especially simple case for survival analysis. Analyzing them is often the first example worked to get students started before moving to more complicated situations. In addition, survival analysis is naturally suited to censored data. In short, I suggest you use survival analysis with a grouping indicator for the two distributions as a treatment effect. You could use a parametric model (e.g., the Weibull model, as the exponential is a special case of the Weibull), or you could use non-parametric methods, such as the log rank test, if you prefer.

Related Question