Fisher’s Z-Transform – When to Use and Why

correlationfisher-transformsample-size

I want to test a sample correlation $r$ for significance, using p-values, that is

$H_0: \rho = 0, \; H_1: \rho \neq 0.$

I have understood that I can use Fisher's z-transform to calculate this by

$z_{obs}= \displaystyle\frac{\sqrt{n-3}}{2}\ln\left(\displaystyle\frac{1+r}{1-r}\right)$

and finding the p-value by

$p = 2P\left(Z>z_{obs}\right)$

using the standard normal distribution.

My question is: how large $n$ should be for this to be an appropriate transformation? Obviously, $n$ must be larger than 3. My textbook does not mention any restrictions, but on slide 29 of this presentation it says that $n$ must be larger than 10. For the data I will be considering, I will have something like $5 \leq n \leq 10$.

Best Answer

For questions like these I would just run a simulation and see if the $p$-values behave as I expect them to. The $p$-value is the probability of randomly drawing a sample that deviates at least as much from the null-hypothesis as the data you observed if the null-hypothesis is true. So if we had many such samples, and one of them had a $p$-value of .04 then we would expect 4% of those samples to have a value less than .04. The same is true for all other possible $p$-values.

Below is a simulation in Stata. The graphs check whether the $p$-values measure what they are supposed to measure, that is, they shows how much the proportion of samples with $p$-values less than the nominal $p$-value deviates from the nominal $p$-value. As you can see that test is somewhat problematic with such small number of observations. Whether or not it is too problematic for your research is your judgement call.

clear all
set more off

program define sim, rclass
    tempname z se
    foreach i of numlist 5/10 20(10)50 {
        drop _all
        set obs `i'
        gen x = rnormal()
        gen y = rnormal()
        corr x y 
        scalar `z'  = atanh(r(rho))
        scalar `se' = 1/sqrt(r(N)-3)
        return scalar p`i' = 2*normal(-abs(`z'/`se'))
    }
end

simulate p5 =r(p5)  p6 =r(p6)  p7  =r(p7)     ///
         p8 =r(p8)  p9 =r(p9)  p10 =r(p10)    ///
         p20=r(p20) p30=r(p30) p40 =r(p40)    ///
         p50=r(p50), reps(200000) nodots: sim 

simpplot p5 p6 p7 p8 p9 p10, name(small, replace) ///
    scheme(s2color) ylabel(,angle(horizontal))

enter image description here

simpplot p20 p30 p40 p50 , name(less_small, replace) ///
    scheme(s2color) ylabel(,angle(horizontal))

enter image description here

Best Answer

Related Solutions

Solved – Farlie-Gumbel-Morgenstern Bivariate Gamma Distirbution

Solved – How to use student’s-t distribution without the sample size

Related Question