R – Calculating P-value from an Arbitrary Distribution

distributionsp-valuer

I hope this isn't a silly question. Let's say I have some arbitrary continuous distribution. I also have a statistic, and I'd like to use this arbitrary distribution to get a p-value for this statistic.

I realize that in R it's easy to do this as long as your distribution fits one of the built-in ones, like if it's normal. But is there an easy way to do this with any given distribution, without making that kind of assumption?

Best Answer

If you have a cumulative distribution function $F$, then calculating the $p$-value for given statistic $T$ is simply $1-F(T)$. This is straightforward in R. If you have probability density function on the other hand, then $F(x)=\int_{-\infty}^xp(t)dt$. You can find this integral analytically or numerically. In R this will look like this:

dF <- function(x)dnorm(x)
pF <- function(q)integrate(dF,-Inf,q)$value 

> pF(1)
[1] 0.8413448
> pnorm(1)
[1] 0.8413447

You can tune integrate for better accuracy. This of course may fail for specific cases, when the integral does not behave well, but it should work for majority of the density functions.

You can of course pass parameters into pF, if you have several parameter values to try-out and do not want to redefine dF each time.

dF <- function(x,mean=0,sd=1)dnorm(x,mean=mean,sd=sd)
pF <- function(q,mean=0,sd=1)integrate(dF,-Inf,q,mean=mean,sd=sd)$value 

> pF(1,1,1)
[1] 0.5
> pnorm(1,1,1)
[1] 0.5

Of course you can also use Monte-Carlo methods as detailed by @suncoolsu, this would be just another numerical method for integration.

Related Question