Solved – Analysis of Danish mask study data by Nassim Nicholas Taleb (binomial GLM with complete separation)

binomial distributioncontingency tablesfishers-exact-testlogisticseparation

Recently, Nassim Nicholas Taleb made this post about the recent Danish mask study, a randomized controlled trial which concluded that the proportions of newly diagnosed coronavirus infections was not significantly different among the group with masks and the non-mask wearing control group (42/2392=1.8% vs. 53/2470=2.1%), based on a 2-tailed p value from a logistic regression.

Taleb points out that if you focus just on the cases where infection has been confirmed by a qRT PCR test you get 0/2392 vs. 5/2470. He writes
"Now consider the more obvious error. What are the odds of getting 0 PCRs vs 5 from random?

The probability of having 0 realizations in 2392 if the mean is 5/2470 is 0.0078518, that is 1 in 127. We can reexpress it in p values, which would be well <.05, that is far in excess of the ps in the range .21-.33 that the paper shows. How these researchers missed this point is beyond me."

To me, this calculation doesn't seem to make much sense, since that binomial PMF-based calculation ignores the sampling uncertainty on 5/2470. To this, Taleb responded "I don’t do P values: https://arxiv.org/pdf/1603.07532.pdf", only to later add this Monte Carlo double Bernoulli based calculation, in Mathematica notation, in which he obtains a 1-tailed p value of

ta=Table[data1=RandomVariate[BernoulliDistribution[5/2470],2470]//Total;
         data2=RandomVariate[BernoulliDistribution[5/2470],2400]//Total;
data1-data2,{10^5}];
[[Select[ta,#<-5&]//Length] / 10^5]//N
0.03483

Could somebody shine a light though on what exactly Taleb was calculating here? If he is trying to accomplish a 2-sample binomial test using Monte Carlo here I don't quite understand where the 2400 is coming from and why there is no Bernoulli in there with a 0/2392 expectation (which in itself would be problematic as one would have zero variance then). For a 2-sample MC binomial test I would rather have expected something like (in R, and using a +1 adjustment of all counts to avoid the p=0 binomial expectation in one group) :

p1=rbinom(1E8, 2470, (5+1)/(2470+1))/(2470+1)
p2=rbinom(1E8, 2392, (0+1)/(2392+1))/(2392+1)
mean(p1<=p2) # 1-tailed p = 0.03401019
2*mean(p1<=p2) # 2-tailed p = 0.06802038

but it seems he instead tries something like (I corrected the 2400, which presumably was a typo, and also changed the > to a >=):

mean((rbinom(1E7, 2470, 5/2470)-rbinom(1E7, 2392, 5/2470))>=5) 
# 1-tailed p = 0.0811906

which I believe is just wrong, right? If anything, I would have found this more logical :

mean((rbinom(1E8, 2470, 5/(2470+2392))/2470-rbinom(1E8, 2392, 5/(2470+2392))/2392)>=(5/2470-0/2392)) 
# 1-tailed p = 0.01446185
mean(abs((rbinom(1E8, 2470, 5/(2470+2392))/2470-rbinom(1E8, 2392, 5/(2470+2392))/2392))>=(5/2470-0/2392)) 
# 2-tailed p = 0.03479425

though I am not sure if this would be an accepted way to carry out such a 2-sample binomial test (it would seem sort of the Monte Carlo version of Liddell's 2×2 contingency table test).

Taleb himself was not very helpful, noting that we was just calculating a "double column table joint distribution a la Fisher" and then remarking in his typical blunt style "You seem to be very very ignorant, repeating formulas like a parrot, not understanding what probability is about. I’ve stopped engaging with you.".

I did tell him that because a priori masks could also have made things worse (e.g. when badly used) it was safer to use 2-tailed tests. And that because of complete separation you can't do a regular logistic regression (binomial GLM) (which is what the authors used in their paper), e.g. in R :

summary(glm(cbind(pcrpos, pcrneg) ~ treatment, family=binomial, data=data.frame(treatment=factor(c("masks","control")),pcrpos=c(0,5), pcrneg=c(2392,2470-5))))
# 2-tailed p = 1, obviously not correct

To solve this, we could add 1/2 to our observations as a continuity correction (equivalent to using a Jeffrey's prior in a Bayesian binomial GLM I believe):

summary(glm(cbind(pcrpos+1/2, pcrneg+1/2) ~ treatment, family=binomial, data=data.frame(treatment=factor(c("masks","control")),pcrpos=c(0,5), pcrneg=c(2392,2470-5)))) 
# 2-tailed p = 0.11

I then pointed out that it would be better to do an exact-like logistic regression, e.g. in R:

library(elrm)
fit = elrm(pcrpos/n ~ treatment, ~ treatment, r=2, iter=400000, burnIn=1000, dataset=data.frame(treatment=factor(c("masks", "control")), pcrpos=c(0, 5), n=c(2392, 2470)) )
fit$p.values # 2-tailed p value = 0.06
fit$p.values.se # standard error on p value = 0.0003

And that this would also be very close to the result of a 2-tailed Fisher exact test, based on the hypergeometric distribution, which also gives a 2-tailed p value of 0.06:

fisher.test(rbind(c(0,2392), c(5,2470-5))) # 2-tailed p value = 0.06

or a 1-tailed p-value of 0.03 :

fisher.test(rbind(c(0,2392), c(5,2470-5)), alternative="less") # 1-tailed p value = 0.03

Even though a Fisher exact test would assume both the row and column margins to be fixed, which is in fact not quite correct here, as only the row margins are fixed, and this would make a logistic regression / 2-sample binomial more appropriate.

Another alternative that I pointed out was a Firth's logistic regression, which would give a 2-tailed p value of 0.11 :

library(brglm)
summary(brglm(cbind(pcrpos, pcrneg) ~ treatment, family=binomial, data=data.frame(treatment=factor(c("masks","control")), pcrpos=c(0,5), pcrneg=c(2392,2470-5))))
# 2-tailed p = 0.11

To that he responded "Please don’t give me libraries. Please provide derivations." (never mind that there is no closed-form solution even for the maximum likelihood solution of a binomial GLM).

Anyway, would somebody here be able to give some feedback on this whole discussion, preferably from a formal statistical angle, so that it could potentially please Taleb? And specifically also on the issue of complete separation and how to best deal with it in 2-sample binomial tests or logistic regressions, and what the best options would be there to obtain exact p values.

EDIT: Thinking about the possible options a bit more, an exact unconditional test to compare two independent binomial proportions would probably be most correct. E.g. using Boschloo's test (https://en.wikipedia.org/wiki/Boschloo%27s_test):

library(Exact)
exact.test(rbind(c(0,2392), c(5,2470-5)), method="Boschloo", alternative="two.sided", model="Binomial") 
# Boschloo's test, 2-tailed p = 0.06223
exact.test(rbind(c(0,2392), c(5,2470-5)), method="Boschloo", alternative="less", model="Binomial") 
# Boschloo's test, 1-tailed p = 0.03196

Though that exact.test function seems to have quite a lot of different methods, and I am unsure myself which would be best (in particular for the case with low counts and a group with a binomial expectation of p=0), as I haven't dug into the details of all those methods. E.g. method="Z-pooled" gives more optimistic p values, closer to the p-value I get via the Liddell-like MC method to test the proportions against a common p=5/(2392+2470) above:

exact.test(rbind(c(0,2392), c(5,2470-5)), method="Z-pooled", alternative="two.sided", model="Binomial") 
# 2-tailed p-value = 0.02809
exact.test(rbind(c(0,2392), c(5,2470-5)), method="Z-pooled", alternative="less", model="Binomial") 
# 1-tailed p-value = 0.01425

Likewise, using library(exact2x2) and using method="FisherAdj" I am getting such more optimistic p values :

uncondExact2x2(0, 2392, 5, 2470, alternative="two.sided", method="FisherAdj") 
# 2-tailed p = 0.03417
uncondExact2x2(0, 2392, 5, 2470, alternative="greater", method="FisherAdj") 
# 1-tailed p = 0.01709

Thoughts on which of these tests would be most appropriate here would be welcome.

On another note, if one would take into account false-negative PCR tests (in the same way that Taleb likes to take into account false-positives in the antibody tests) this might change conclusions quite a bit… Also pretty sure that one would need to know which individuals were subjected to each type of test, and what the counts were for all the other respiratory viruses they were tested for.

Best Answer

Your 2-sided test implicitly allots exactly half of your 5% significance level to "masks are harmful" ($M_-$) and the other half to "masks are beneficial" ($M_+$). To a Bayesian like Taleb that might suggest that you aren't properly thinking about your prior, because it implies that the amount of evidence it would take you to accept $M_-$ is exactly equal to the amount it would take you to accept $M_+$, even though $M_+$ is intuitively more likely (to me, at least - if you had asked me a year ago whether wearing a mask would be more likely to increase or decrease the risk of catching a respiratory virus, I would have said decrease).

And it seems to me that since the single independent variable was binary, using Firth's logistic regression over Fisher's exact test doesn't add much value.

But your core point was that most of the doubt that mask wearing reduced PCR infections in the trial comes from the presence of uncertainty in the estimate $p=\frac{5}{2470}$. This seems unarguable.

Taleb says: "You don’t get it. BTW I added a double column table joint distribution a la Fisher." Maybe that's his way of saving face while admitting that you raised a valid criticism.