MATLAB: General statistics problem: how to best characterize non-normal distributions

bootstrapjacknifemeanskewnessStatistics and Machine Learning Toolboxstatistics non-normal non-gaussian distributions

Even though it is not directly MATLAB related, I figured I would pose this question to the MATLAB community because there are a bunch of smart and helpful people here 😀

I have looked and looked but I cannot find a straightforward test or method to characterize a distribution that fails a normality test. I have read several peer-reviewed scientific journal articles where this does not stop authors from giving a mean and standard deviation (!) but I think that is a bad thing to do.

My current approach is to get a kernel smoothing density estimate of the distribution using a function I wrote around the built-in ksdensity() function, and play with the smoothing window width until it gives something that nicely portrays the data (not too spikey, not too round). I then give the peak value of the kernel estimate as my "mean" (i.e. the one number people will look at and prematurely judge everything by). The only way I know to then characterize the distribution width or deviation would be to give a full width at half maximum. Of course this is not good because the distribution tends not to be symmetric around the peak, and is often on the order of the peak value in magnitude.

So people I am working with want to see some kind of error bars, and I have no idea what to give them to make them happy.

This is a recurring theme in my current work and I am desperate to find a good solution, so any pointers would be greatly appreciated. I am sure I am not the only one who has to deal with non-gaussian distributions.

If you want to see an example of one of these distributions, there are a couple in Figure 3 in the paper you can find here:

http://iopscience.iop.org/1478-3975/8/1/015007/

Thanks in advance, Rory

Best Answer

You should NOT use the peak of your distribution to estimate the mean, because it is not the mean. It is the mode.

Since your distribution is skewed, it might be better to use the geometric mean or harmonic mean (see Measures of central tendency). You could also estimate some measure of dispersion and shape.

For estimating the errors in these statistics, you could use the boostrap or the jacknife (see Resampling Statistics).

You could also explore MATLAB's collection of distributions to see if any look like your data (see Distribution Reference). For example, some of the curves look like the Gamma distribution. However, each distribution is a model of a particular kind of statistical process, so ideally you should understand what a distribution represents before using it.

Best Answer

Related Solutions

MATLAB: How to go about finding the standard normal probability based on the z-score

MATLAB: Are the logit-normal distribution is implemented in MATLAB

Related Question