Compute the $((1-\alpha )\times 100)^{th}$ percentile of the null distribution in R

statistical-inferencestatistics

For fixed $n$ and arbitrarily preselected values $\mu$ and $\sigma$,

First, I generated a random sample of size $n$ from $N(\mu ,\sigma)$, say $x_1 ,x_2 ,\ldots ,x_n$;

Second, I arranged the obtained sample from the smallest to the largest.

Next, I estimtated $\mu$ and $\sigma$ by $\bar{x}=\frac{\sum_{i=1}^{n}x_i}{n}$ and $S=\sqrt{\frac{\sum_{i=1}^{n}(x_i -\bar{x})^2}{n-1}}$, respectively.

Then, I computed the statistic $\delta=\frac{\sum_{i=1}^{n}|z_i -\frac{i}{n}|}{\sum_{i=1}^{n}\max (z_i ,\frac{i}{n})}$.

Finally, I repeated previous steps 25 times to get the set $\{ \delta_1, \delta_2 ,\ldots ,\delta_{25}\}$I assume $z_i =\phi (\frac{x_i -\bar{x}}{S})$, the CDF of the standard Normal distribution with $\bar{x}=\frac{\sum_{i=1}^{n}x_i}{n}$ and $S=\sqrt{\frac{\sum_{i=1}^{n}(x_i -\bar{x})^2}{n-1}}$.

My code is as follows:

n=10
mu=100
sigma=10
D=c()
for(j in 1:25){
x=rnorm(n,mu,sigma)
x=x[order(x)]
S=sqrt(sum((x - mean(x))^2) / (n - 1))
z = c()
for(i in 1:n){
 z[i] = dnorm((x[i]-mean(x))/S,0,1)
}
num = 0 
for(i in 1:n){
 num = num + abs(z[i]-i/n)
}
den = 0
for(i in 1:n){
 den = den + max(z[i],i/n)
}
D[j]=delta = num/den
}

Here I'm using the normal test of the sample which is equivalent to test the null hypothesis $H_0 :F_0 =Normal(\mu ,\sigma)$, where the parameters $\mu$ and $\sigma^2$ are unknown.

My question is that: How can I compute the $\delta_{\alpha ,n}$, $((1-\alpha )\times 100)^{th}$ percentile of the null distribution of $\delta$ at $n$ that satisfies $P(\delta >\delta_{\alpha ,n}|H_0 )=\alpha$ in R program.

enter image description here

enter image description here

Best Answer

So what you doing is basically a Monte Carlo Estimate of a specific quantile $\delta_{\alpha ,n}$ so that, $P(\delta >\delta_{\alpha ,n}|H_0)=\alpha.$

Your method would be like this:

  • Let $F$ be the distribution of $\delta$ under the null distribution.
  • Simulate random sample $\delta_1,\delta_2,\delta_3,\cdots,\delta_N$ from $F$; $\delta_i\sim F \ \forall i$
  • Compute the empirical distribution corresponding to them: $$F_N(x)=\frac{1}{N}\sum_{i=1}^N \mathbb{I}_{\delta_i\le x} \left(\stackrel{N\to\infty}{\longrightarrow}F(x)\right)$$
  • So your estimated quantile would be: $$F_N^{-1}(\alpha)$$ for a large enough N
  • This quantile $F_N^{-1}(\alpha)$ can be approximated by the "almost sure left inverse" $$\inf\{x\,|\, F_N(x)\geq \alpha\}$$

Your code is almost okay, you have to finally calculate sort(D)[length(D)*alpha]. Also, take a much larger value of $N$ than just 25.

So, finally, in R, your program would be (taking a larger value of N, not 25):

n=10
mu=100
sigma=10
N=1000
alpha=0.05     # Or your favorite value

D=array(0, dim=N)
for(j in 1:N){
    x=rnorm(n,mu,sigma)
    x=x[order(x)]       ## or use sort(x)

    # S=sd(x)           ## sqrt(sum((x - mean(x))^2) / (n - 1))
    # z = c()
    # for(i in 1:n){
    #     z[i] = dnorm((x[i]-mean(x))/S,0,1)
    # }
    z=dnorm(x, mean(x), sd(x))  # R can do it at once

    # num=0
    # for(i in 1:n){
    #     num = num + abs(z[i]-i/n)
    # }
    num = sum( abs(z - seq(1,n)/n ) ) 

    # den = 0
    # for(i in 1:n){
    #     den = den + max(z[i],i/n)
    # }
    den = sum( pmax(z, seq(1,n)/n) )

    D[j]=num/den
}
sort(D)[N*alpha]

I have commented on some parts of your codes and compacted them.

Related Question