Solved – the Bayesian justification for privileging analyses conducted earlier than other analyses

bayesian

Background and Empirical Example

I have two studies; I ran an experiment (Study 1) and then replicated it (Study 2). In Study 1, I found an interaction between two variables; in Study 2, this interaction was in the same direction but not significant. Here is the summary for Study 1's model:

Coefficients:
                        Estimate Std. Error t value Pr(>|t|)    
(Intercept)              5.75882    0.26368  21.840  < 2e-16 ***
condSuppression         -1.69598    0.34549  -4.909 1.94e-06 ***
prej                    -0.01981    0.08474  -0.234  0.81542    
condSuppression:prej     0.36342    0.11513   3.157  0.00185 ** 

And Study 2's model:

Coefficients:
                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)           5.24493    0.24459  21.444   <2e-16 ***
prej                  0.13817    0.07984   1.731   0.0851 .  
condSuppression      -0.59510    0.34168  -1.742   0.0831 .  
prej:condSuppression  0.13588    0.11889   1.143   0.2545  

Instead of saying, "I guess I don't have anything, because I 'failed to replicate,'" what I did was combine the two data sets, created a dummy variable for what study the data came from, and then ran the interaction again after controlling for study dummy variable. This interaction was significant even after controlling for it, and I found that this two-way interaction between condition and dislike/prej was not qualified by a three-way interaction with the study dummy variable.

Introducing Bayesian Analysis

I had someone suggest that this is a great opportunity to use Bayesian analysis: In Study 2, I have information from Study 1 that I can use as prior information! In this way, Study 2 is doing a Bayesian updating from the frequentist, ordinary least squares results in Study 1. So, I go back and re-analyze the Study 2 model, now using informative priors on the coefficients: All the coefficients had a normal prior where the mean was the estimate in Study 1 and the standard deviation was the standard error in Study 1.

This is a summary of the result:

Estimates:
                       mean    sd      2.5%    25%     50%     75%     97.5%
(Intercept)             5.63    0.17    5.30    5.52    5.63    5.74    5.96
condSuppression        -1.20    0.20   -1.60   -1.34   -1.21   -1.07   -0.80
prej                    0.02    0.05   -0.08   -0.01    0.02    0.05    0.11
condSuppression:prej    0.34    0.06    0.21    0.30    0.34    0.38    0.46
sigma                   1.14    0.06    1.03    1.10    1.13    1.17    1.26
mean_PPD                5.49    0.11    5.27    5.41    5.49    5.56    5.72
log-posterior        -316.40    1.63 -320.25 -317.25 -316.03 -315.23 -314.29

It looks like now we have pretty solid evidence for an interaction from the Study 2 analysis. This agrees with what I did when I simply stacked the data on top of one another and ran the model with study number as a dummy-variable.

Counterfactual: What If I Ran Study 2 First?

That got me thinking: What if I had run Study 2 first and then used the data from Study 1 to update my beliefs on Study 2? I did the same thing as above, but in reverse: I re-analyzed the Study 1 data using the frequentist, ordinary least squares coefficient estimates and standard deviations from Study 2 as prior means and standard deviations for my analysis of Study 1 data. The summary results were:

Estimates:
                          mean    sd      2.5%    25%     50%     75%     97.5%
(Intercept)                5.35    0.17    5.01    5.23    5.35    5.46    5.69
condSuppression           -1.09    0.20   -1.47   -1.22   -1.09   -0.96   -0.69
prej                       0.11    0.05    0.01    0.08    0.11    0.14    0.21
condSuppression:prej       0.17    0.06    0.05    0.13    0.17    0.21    0.28
sigma                      1.10    0.06    0.99    1.06    1.09    1.13    1.21
mean_PPD                   5.33    0.11    5.11    5.25    5.33    5.40    5.54
log-posterior           -303.89    1.61 -307.96 -304.67 -303.53 -302.74 -301.83

Again, we see evidence for an interaction, however this might not have necessarily been the case. Note that the point estimate for both Bayesian analyses aren't even in the 95% credible intervals for one another; the two credible intervals from the Bayesian analyses have more non-overlap than they do overlap.

What Is The Bayesian Justification For Time Precedence?

My question is thus: What is the justifications that Bayesians have for respecting the chronology of how the data were collected and analyzed? I get results from Study 1 and use them as informative priors in Study 2 so that I use Study 2 to "update" my beliefs. But if we assume that the results I get are randomly taken from a distribution with a true population effect… then why do I privilege the results from Study 1? What is the justification for using Study 1 results as priors for Study 2 instead of taking Study 2 results as priors for Study 1? Does the order in which I collected and calculated the analyses really matter? It does not seem like it should to me—what is the Bayesian justification for this? Why should I believe the point estimate is closer to .34 than it is to .17 just because I ran Study 1 first?


Responding to Kodiologist's Answer

Kodiologist remarked:

The second of these points to an important departure you have made from Bayesian convention. You didn't set a prior first and then fit both models in Bayesian fashion. You fit one model in a non-Bayesian fashion and then used that for priors for the other model. If you used the conventional approach, you wouldn't see the dependence on order that you saw here.

To address this, I fit the models for Study 1 and Study 2 where all regression coefficients had a prior of $\text{N}(0, 5)$. The cond variable was a dummy variable for experimental condition, coded 0 or 1; the prej variable, as well as the outcome, were both measured on 7-point scales ranging from 1 to 7. Thus, I think it is a fair choice of prior. Just by how the data are scaled, it would be very, very rare to see coefficients much larger than what that prior suggests.

The mean estimates and standard deviation of those estimates are about the same as in the OLS regression. Study 1:

Estimates:
                       mean     sd       2.5%     25%      50%      75%      97.5% 
(Intercept)             5.756    0.270    5.236    5.573    5.751    5.940    6.289
condSuppression        -1.694    0.357   -2.403   -1.925   -1.688   -1.452   -0.986
prej                   -0.019    0.087   -0.191   -0.079   -0.017    0.040    0.150
condSuppression:prej    0.363    0.119    0.132    0.282    0.360    0.442    0.601
sigma                   1.091    0.057    0.987    1.054    1.088    1.126    1.213
mean_PPD                5.332    0.108    5.121    5.259    5.332    5.406    5.542
log-posterior        -304.764    1.589 -308.532 -305.551 -304.463 -303.595 -302.625

And Study 2:

Estimates:
                       mean     sd       2.5%     25%      50%      75%      97.5% 
(Intercept)             5.249    0.243    4.783    5.082    5.246    5.417    5.715
condSuppression        -0.599    0.342   -1.272   -0.823   -0.599   -0.374    0.098
prej                    0.137    0.079   -0.021    0.084    0.138    0.192    0.287
condSuppression:prej    0.135    0.120   -0.099    0.055    0.136    0.214    0.366
sigma                   1.132    0.056    1.034    1.092    1.128    1.169    1.253
mean_PPD                5.470    0.114    5.248    5.392    5.471    5.548    5.687
log-posterior        -316.699    1.583 -320.626 -317.454 -316.342 -315.561 -314.651

Since these means and standard deviations are the more or less the same as the OLS estimates, the order effect above still occurs. If I plug-in the posterior summary statistics from Study 1 into the priors when analyzing Study 2, I observe a different final posterior than when analyzing Study 2 first and then using those posterior summary statistics as priors for analyzing Study 1.

Even when I use the Bayesian means and standard deviations for the regression coefficients as priors instead of the frequentist estimates, I would still observe the same order effect. So the question remains: What is the Bayesian justification for privileging the study that came first?

Best Answer

Bayes' theorem says the posterior is equal to prior * likelihood after rescaling (so the probability sums to 1). Each observation has a likelihood which can be used to update the prior and create a new posterior:

posterior_1 = prior * likelihood_1
posterior_2 = posterior_1 * likelihood_2
...
posterior_n = posterior_{n-1} * likelihood_n

So that

posterior_n = prior * likelihood_1 * ... * likelihood_n

The commutativity of multiplication implies the updates can be done in any order. So if you start with a single prior, you can mix the observations from Study 1 and Study 2 in any order, apply Bayes' formula and arrive at the same final posterior.

Related Question