Solved – Log of Ratio Results in Log-Normal Distribution

data transformationlogarithmlognormal distributionmeasurement

I am trying to understand some chemical concentration data I have measured. I am taking the log of the ratio of two concentrations. The ratio itself is from oscillating timeseries data and is the (local maxima in conc):(local minima in conc). I then take the log of this ratio. When I do this and plot the distribution is appears to my eyes as being roughly log-normal with the bulk of the probability at around 0.75 (using log base 2) and a long tail skewed-right (despite the plot there is no mass less than zero as the local max>local min).
enter image description here

I would like to find outliers in this distribution as well as find mean+SD cutoffs to threshold it (comparing to another similar dataset from a treatment condition and I want to characterize what the common variation is in this distribution as a control). Is it best to transform this to a log-normal distribution to do something like this? Should I check other distributions as well? Any relatively straightforward suggestions would be appreciated.

Here is the actual sorted data that was used to plot the kernel density seen above. I have many other realizations of distributions of log ratios similar to this one:

{0.34, 0.35, 0.38, 0.42, 0.45, 0.47, 0.47, 0.53, 0.54, 0.56, 0.59,
0.6, 0.61, 0.61, 0.62, 0.65, 0.71, 0.72, 0.8, 0.84, 0.9, 0.92, 0.95,
0.96, 0.96, 1.68, 1.81, 2.03, 2.03, 3.19, 3.19, 3.37, 3.79, 4.65,
4.75}

Best Answer

Thanks for posting the data. Here is a slightly tougher check on whether the data are lognormal, a normal quantile plot of the logged ratios.

enter image description here

Considerable caution is indicated:

  1. The sample size is 35. Easy to say, but that is a small sample for this kind of exercise.

  2. The grouping is suggestive, or may be just a quirk as can be expected in any sample of this size. Certainly you should check whether there is anything distinctive underlying the 10 highest values.

  3. The fit is middling, but I didn't search through other distributions to try to find a better fit.

I don't see why mixtures are expected to be mixtures of normals. It's my impression that that is the most common kind of mixture fitted to data, a different point.

I used natural logarithms, but using log base 2 would clearly just change axis labels, and nothing fundamental.

Related Question