[Math] Standard error and p-values

probabilitystandard deviationstatistics

I am a scientist who is currently doing a BSc. I did some stats a few years ago but since haven't touched it. I'm not sure how to calculate p-values for my data. A bit of background is that I am assessing replication of a virus when I mutate it. The hypothesis is the mutant still replicates, but less efficiently. I'm looking to see if there's statistical significance at the 95% confidence level and then also working out a p-value.

So I measure replication and I get this for my completely normal (a.k.a wild type) virus (expressed as a percentage of maximal replication possible)

Wild type: 65%, 67%, 70%

My mutant form gives me these results: 50%, 60%, 53%

For standard error I use the sample standard error formula and for 95% confidence intervals I multiply this by 1.96 for each so I get intervals for both the normal/wild type form and for my mutant.

However how do I calculate a p-value? I thought I would calculate the mean and standard deviation of wild type, then compare the mean for my mutant using z-scores etc but I don't think that's right.

Best Answer

For testing two proportions, we will use the two proportion hypothesis test Since you have three sets of data, three wild and three mutant, you can do three different tests of the same type to confirm your results. Your hypotheses would then be: $$H_0:p_{wild}-p_{mutant}=0$$ and $$H_1:p_{wild}-p_{mutant}\neq0.$$
You could also have $H_1:p_{wild}\gt{p_{mutant}}$. This will just change our $z_{\frac{\alpha}{2}}$ for $H_1:p_{wild}-p_{mutant}\neq0$ to $z_{\alpha}$ for $H_1:p_{wild}\gt{p_{mutant}}.$

You will then need the following information:
1.) $n_{wild}$ and $n_{mutant}$, the number of times you did each experiment.
2.) $\widehat{p_{wild}}$ and $\widehat{p_{mutant}}$. These are your experimental proportions from the data you collected. For example, for the first set, $\widehat{p_{wild}}=.65$.
3.) Finally, you need a $\hat{p}$. Experimentally, this is the number of successes from your wild group ($Y_{wild}$)and your mutant group ($Y_{mutant}$) with success being replication divided by $n_1+n_2$. (Y is capitalized because it is a random variable that changes experiment to experiment. We denote these random variables in capital letters. If you have an actual number of successes in a given trial it would be denoted $y_{wild}$ for example).
So that is, $$\hat{p}=\frac{Y_{wild}+Y_{mutant}}{n_{wild}+n_{mutant}}$$ You could just assume 100 trials and 63 successes, for example if you don't have that data, it would work. You would definitely prefer NEVER to do this though, as you don't want to mess with the integrity of your study. Having those exact figures should be your goal.

Now, the reason this works is that $$Z=\frac{\widehat{p_{wild}}-\widehat{p_{mutant}}}{\sqrt{{\hat{p}}(1-\hat{p})(\frac{1}{n_{wild}}+\frac{1}{n_{mutant}})}}\sim{N(0,1)}.$$ You already know your $z_{\alpha}$ (or $z_{\frac{\alpha}{2}}$, depending on which alternative hypothesis you use), so you compare your z-statistic with in this case 1.96. if your value computed for $z\ge{z_{\alpha}}$, you reject the null hypothesis that the proportions are equal.

Using your information on the first trial (assuming 100 experiments from each for this simulation), we have: $$n_{wild}=n_{mutant}=100, p_{wild}=.65, p_{mutant}=.5, \hat{p}=.575$$ So $$z=\frac{.65-.5}{\sqrt{(.575)(.425)(\frac{2}{100})}}\approx\frac{.15}{.0699}\approx2.1456\ge1.96$$ Now we can reject your hypothesis that mutated viruses replicate as efficiently as wild viruses. You could then repeat this experiment for the next two data sets and confirm this or not confirm it. I'll leave the rest to you then.

Finally, your question was in regards to the p-value. As I stated in the comments, the p-value is simply $$p-value=P(Z\ge{z})$$ In your case, the p-value is $P(Z\ge{2.146})=.016\lt.05=\alpha.$ This has the same effect as what we did above. If your p-value is less than your chosen $\alpha$ which in this case is (1-.95), you reject the null. If it is greater than your $\alpha$, you fail to reject.

On a side note, you should be aware that a failure to reject the null hypothesis does NOT mean that the alternative hypothesis is true. This basically means that you do not have enough evidence to give you reason to reject. This could be because you didn't do enough experiment. If you look at the z-statistic we calculate. If our values of $n_{wild}$ and $n_{mutant}$ are small, say 50 instead of 100, we would have to fail to reject the null hypothesis because we would have a z-statistic of $1.517\le1.96$. Now you can pretty much see there is a difference in the two data sets you have, but without enough empirical data through experimentation you couldn't reject your claim.

Related Question