Solved – Fitting data to gamma distribution to find score which corresponds to pvalue < 0.05

density functiongamma distributionmodelingp-valuer

I have data of size 116.667 rows defined as:

iD  Signal
chr17.3620  0.5741552
chr1.7341   0.5680284
chr7.3937   0.5479430
chr17.3890  0.5402434
chr12.3200  0.5298978
chr17.7227  0.5298536

As it is difficult to show results with all the data, because I have no reputation enough to upload images I made my post with the first 200 rows.

Summary:

  • Min. = 0.3693
  • 1st Quartile = 0.3847
  • Median = 0.4039
  • Mean = 0.4199
  • 3rd Quartile = 0.4413
  • Max. = 0.5742

Also I tried some exploratory analysis to see which kind of distribution follows this data (i.e. plot(density(t)) and qqnorm(t); qqline(t,col=2)). According to preliminary results, I would say that this data apparently follows a gamma distribution.

Here I put the first 200 signal values:

A <- structure(list(V1 = c(0.5741552,0.5680284,0.5479430,0.5402434,0.5298978,0.5298536,0.5282417,
0.5165426,0.5131503,0.5129329,0.5105448,0.5104201,0.5095860,0.5090263,
0.5061467,0.4972821,0.4959428,0.4953381,0.4920510,0.4915160,0.4868505,
0.4843749,0.4825519,0.4823313,0.4809742,0.4788553,0.4775991,0.4770962,
0.4745947,0.4743952,0.4727112,0.4718017,0.4714738,0.4674141,0.4670385,
0.4648104,0.4633502,0.4616054,0.4615068,0.4614247,0.4613338,0.4597812,
0.4551755,0.4535067,0.4528133,0.4508228,0.4494993,0.4494936,0.4442789,
0.4413460,0.4412279,0.4402557,0.4392294,0.4385639,0.4385187,0.4361337,
0.4344499,0.4342413,0.4342331,0.4338879,0.4337806,0.4336820,0.4329372,
0.4325534,0.4323201,0.4312287,0.4292037,0.4281761,0.4279843,0.4279774,
0.4252035,0.4243487,0.4228516,0.4226953,0.4218263,0.4214821,0.4212546,
0.4210894,0.4206089,0.4204235,0.4193896,0.4168915,0.4164699,0.4152126,
0.4127455,0.4126053,0.4113571,0.4105654,0.4099753,0.4088188,0.4085093,
0.4075957,0.4074018,0.4072499,0.4072114,0.4067329,0.4065400,0.4052757,
0.4044982,0.4040699,0.4036509,0.4033471,0.4031712,0.4026698,0.4017872,
0.4011538,0.4011325,0.4011320,0.4008897,0.4006470,0.4003469,0.3996736,
0.3992583,0.3991979,0.3990366,0.3989118,0.3983172,0.3980860,0.3978592,
0.3977522,0.3965371,0.3963045,0.3957640,0.3954328,0.3950159,0.3935825,
0.3934975,0.3932916,0.3931091,0.3929565,0.3922829,0.3919779,0.3919713,
0.3914740,0.3910446,0.3909540,0.3890607,0.3890550,0.3876478,0.3875172,
0.3873815,0.3872299,0.3870533,0.3858995,0.3858361,0.3855984,0.3854444,
0.3852595,0.3849558,0.3847531,0.3844442,0.3842814,0.3831377,0.3822418,
0.3817666,0.3805661,0.3803090,0.3802035,0.3800845,0.3800580,0.3799694,
0.3795814,0.3794039,0.3792874,0.3788970,0.3787295,0.3785160,0.3782523,
0.3782439,0.3779547,0.3778596,0.3777452,0.3770986,0.3767652,0.3767104,
0.3765786,0.3760886,0.3760124,0.3753271,0.3750943,0.3749116,0.3744146,
0.3743998,0.3730250,0.3729932,0.3727007,0.3726170,0.3722539,0.3721743,
0.3721055,0.3720965,0.3718959,0.3714824,0.3711862,0.3709115,0.3708262,
0.3706647,0.3701728,0.3697490,0.3692868)))

t<-as.matrix(A$V1)

My questions are:

1.- How can I fit my data to a gamma distribution?

2.- Is there any way to determine, according to the model fitting, which signal value corresponds to p-value < 0.05?

Thanks for your help!

PD 1: I accept suggestions in order to edit as this is my first post in the community!

PD 2 : I've been trying all of this using R

Best Answer

  1. One simple way to fit a gamma distribution to the data is the method of moments: the gamma distribution with parameters $(\alpha, \beta)$ has mean $\frac\alpha\beta$ and variance $\frac\alpha{\beta^2}$. You can use sample estimates of the mean and variance and some algebra to solve for the parameters of the model. Naturally, more advanced methods exist. But with the large number of observations, this should be a good starting point.

  2. I'm uncertain what your second question means. Are you looking for the score which divides the smallest 95% of your data from the largest 5%?

Related Question