Probability spaces and Kolmogorov's axioms
A probability space $\mathcal{P}$ is by definition a tripple $(\Omega, \mathcal{F}, \mathbb{P} )$ where $\Omega$ is a set of outcomes, $\mathcal{F}$ is a $\sigma$-algebra on the subsets of $\Omega$ and $\mathbb{P}$ is a probability-measure that fulfills the axioms of Kolmogorov, i.e. $\mathbb{P}$ is a function from $\mathcal{F}$ to $[0,1]$ such that $\mathbb{P}(\Omega)=1$ and for disjoint $E_1, E_2, \dots$ in $\mathcal{F}$ it holds that $P \left( \cup_{j=1}^\infty E_j \right)=\sum_{j=1}^\infty \mathbb{P}(E_j)$.
Within such a probability space one can, for two events $E_1, E_2$ in $\mathcal{F}$ define the conditional probability as $\mathbb{P}(E_1|_{E_2})\stackrel{def}{=}\frac{\mathbb{P}(E_1 \cap E_2)}{\mathbb{P}(E_2)}$
Note that:
- this ''conditional probability'' is only defined when $\mathbb{P}$ is defined on $\mathcal{F}$, so we need a probability space to be able to define conditional probabilities.
- A probability space is defined in very general terms (a set $\Omega$, a $\sigma$-algebra $\mathcal{F}$ and a probability measure $\mathbb{P}$), the only requirement is that certain properties should be fulfilled but apart from that these three elements can be ''anything''.
More detail can be found in this link
Bayes' rule holds in any (valid) probability space
From the definition of conditional probability it also holds that $\mathbb{P}(E_2|_{E_1})=\frac{\mathbb{P}(E_2 \cap E_1)}{\mathbb{P}(E_1)}$. And from the two latter equations we find Bayes' rule. So Bayes' rule holds (by definition of conditional probabilty) in any probability space (to show it, derive $\mathbb{P}(E_1 \cap E_2)$ and $\mathbb{P}(E_2 \cap E_1)$ from each equation and equate them (they are equal because intersection is commutative)).
As Bayes rule is the basis for Bayesian inference, one can do Bayesian analysis in any valid (i.e. fulfilling all conditions, a.o. Kolmogorov's axioms) probability space.
Frequentist definition of probability is a ''special case''
The above holds ''in general'', i.e. we have no specific $\Omega$, $\mathcal{F}$, $\mathbb{P}$ in mind as long as $\mathcal{F}$ is a $\sigma$-algebra on subsets of $\Omega$ and $\mathbb{P}$ fulfills Kolmogorov's axioms.
We will now show that a ''frequentist'' definition of $\mathbb{P}$ fulfills Kolomogorov's axioms. If that is the case then ''frequentist'' probabilities are only a special case of Kolmogorov's general and abstract probability.
Let's take an example and roll the dice. Then the set of all possible outcomes $\Omega$ is $\Omega=\{1,2,3,4,5,6\}$. We also need a $\sigma$-algebra on this set $\Omega$ and we take $\mathcal{F}$ the set of all subsets of $\Omega$, i.e. $\mathcal{F}=2^\Omega$.
We still have to define the probability measure $\mathbb{P}$ in a frequentist way. Therefore we define $\mathbb{P}(\{1\})$ as $\mathbb{P}(\{1\}) \stackrel{def}{=} \lim_{n \to +\infty} \frac{n_1}{n}$ where $n_1$ is the number of $1$'s obtained in $n$ rolls of the dice. Similar for $\mathbb{P}(\{2\})$, ... $\mathbb{P}(\{6\})$.
In this way $\mathbb{P}$ is defined for all singletons in $\mathcal{F}$. For any other set in $\mathcal{F}$, e.g. $\{1,2\}$ we define $\mathbb{P}(\{1,2\})$ in a frequentist way i.e.
$\mathbb{P}(\{1,2\}) \stackrel{def}{=} \lim_{n \to +\infty} \frac{n_1+n_2}{n}$, but by the linearity of the 'lim', this is equal to $\mathbb{P}(\{1\})+\mathbb{P}(\{2\})$, which implies that Kolmogorov's axioms hold.
So the frequentist definition of probability is only a special case of Kolomogorov's general and abstract definition of a probability measure.
Note that there are other ways to define a probability measure that fulfills Kolmogorov's axioms, so the frequentist definition is not the only possible one.
Conclusion
The probability in Kolmogorov's axiomatic system is ''abstract'', it has no real meaning, it only has to fulfill conditions called ''axioms''. Using only these axioms Kolmogorov was able to derive a very rich set of theorems.
The frequentist definition of probability fullfills the axioms and therefore replacing the abstract, ''meaningless'' $\mathbb{P}$ by a probability defined in a frequentist way, all these theorems are valid because the ''frequentist probability'' is only a special case of Kolmogorov's abstract probability (i.e. it fulfills the axioms).
One of the properties that can be derived in Kolmogorov's general framework is Bayes rule. As it holds in the general and abstract framework, it will also hold (cfr supra) in the specific case that the probabilities are defined in a frequentist way (because the frequentist definition fulfills the axioms and these axioms were the only thing that is needed to derive all theorems). So one can do Bayesian analysis with a frequentist definition of probability.
Defining $\mathbb{P}$ in a frequentist way is not the only possibility, there are other ways to define it such that it fulfills the abstract axioms of Kolmogorov. Bayes' rule will also hold in these ''specific cases''. So one can also do Bayesian analysis with a non-frequentist definition of probability.
EDIT 23/8/2016
@mpiktas reaction to your comment:
As I said, the sets $\Omega, \mathcal{F}$ and the probability measure $\mathbb{P}$ have no particular meaning in the axiomatic system, they are abstract.
In order to apply this theory you have to give further definitions (so what you say in your comment "no need to muddle it further with some bizarre definitions'' is wrong, you need additional definitions).
Let's apply it to the case of tossing a fair coin. The set $\Omega$ in Kolmogorov's theory has no particular meaning, it just has to be ''a set''. So we must specify what this set is in case of the fair coin, i.e. we must define the set $\Omega$. If we represent head as H and tail as T, then the set $\Omega$ is by definition $\Omega\stackrel{def}{=}\{H,T\}$.
We also have to define the events, i.e. the $\sigma$-algebra $\mathcal{F}$. We define is as $\mathcal{F} \stackrel{def}{=} \{\emptyset, \{H\},\{T\},\{H,T\} \}$. It is easy to verify that $\mathcal{F}$ is a $\sigma$-algebra.
Next we must define for every event in $E \in \mathcal{F}$ its measure. So we need to define a map from $\mathcal{F}$ in $[0,1]$. I will define it in the frequentist way, for a fair coin, if I toss it a huge number of times, then the fraction of heads will be 0.5, so I define $\mathbb{P}(\{H\})\stackrel{def}{=}0.5$. Similarly I define $\mathbb{P}(\{T\})\stackrel{def}{=}0.5$, $\mathbb{P}(\{H,T\})\stackrel{def}{=}1$ and $\mathbb{P}(\emptyset)\stackrel{def}{=}0$. Note that $\mathbb{P}$ is a map from $\mathcal{F}$ in $[0,1]$ and that it fulfills Kolmogorov's axioms.
For a reference with the frequentist definition of probability see this link (at the end of the section 'definition') and this link.
Best Answer
WARNING I wrote this answer a long time ago with very little idea what I was talking about. I can't delete it because it's been accepted, but I can't stand behind most of the content.
This is a very long answer and I hope it'll be helpful in some way. SPC isn't my area, but I think these comments are general enough that they apply here.
I'd argue that the most-oft-cited advantage -- the ability to incorporate prior beliefs -- is a weak advantage applied/empirical fields. That's because you need to quantify your prior. Even if I can say "well, level z is definitely implausible," I can't for the life of me tell you what should happen below z. Unless authors start publishing their raw data in droves, my best guesses for priors are conditional moments taken from previous work that may or may not have been fitted under similar conditions to the ones you're facing.
Basically, Bayesian techniques (at least on a conceptual level) are excellent for when you have a strong assumption/idea/model and want to take it to data, then see how wrong or not wrong you turn out to be. But often you are not looking to see whether you're right about one particular model for your business process; more likely you have no model, and are looking to see what your process is going to do. You do not want to push your conclusions around, you want your data to push your conclusions. If you have enough data, that's what will happen anyway, but in that case why bother with the prior? Perhaps that's overly skeptical and risk-averse, but I've never heard of an optimistic businessman that was also successful. There is no way to quantify your uncertainty about your own beliefs, and you would rather not run the risk of being overconfident in the wrong thing. So you set an uninformative prior and the advantage disappears.
This is interesting in the SPC case because unlike in, say, digital marketing, your business processes aren't forever in an unpredictable state of flux. My impression is that business processes tend to change deliberately and incrementally. That is, you have a long time to build up good, safe priors. But recall that priors are all about propagating uncertainty. Subjectivity aside, Bayesianism has the advantage that it objectively propagates uncertainty across deeply-nested data generating processes. That, to me, is really what Bayesian statistics is good for. And if you're looking for reliability of your process well beyond the 1-in-20 "significance" cutoff, it seems like you would want to account for as much uncertainty as possible.
So where are the Bayesian models? First off, they're hard to implement. To put it bluntly, I can teach OLS to a mechanical engineer in 15 minutes and have him cranking out regressions and t-tests in Matlab in another 5. To use Bayes, I first need to decide what kind of model I'm fitting, and then see if there's a ready-made library for it in a language someone at my company knows. If not, I have to use BUGS or Stan. And then I have to run simulations to get even a basic answer, and that takes about 15 minutes on an 8-core i7 machine. So much for rapid prototyping. And second off, by the time you get an answer, you've spent two hours of coding and waiting, only to get the same result as you could have with frequentist random effects with clustered standard errors. Maybe this is all presumptuous and wrongheaded and I don't understand SPC at all. But I see it in academia and in for-profit social science constantly, and I'd be surprised if things were different in other fields.
I liken Bayesianism to a very high-quality chef knife, a stockpot, and a sautee pan; frequentism is like a kitchen full of As-Seen-On-TV tools like banana slicers and pasta pots with holes in the lid for easy draining. If you're a practiced cook with lots of experience in the kitchen--indeed, in your own kitchen of substantive knowledge, which is clean and organized and you know where everything is located--you can do amazing things with your small selection of elegant, high-quality tools. Or, you can use a bunch of different little ad-hoc* tools, that require zero skill to use, to make a meal that's simple, really not half bad, and has a couple basic flavors that get the point across. You just got home from the data mines and you're hungry for results; which cook are you?
*Bayes is just as ad-hoc, but less transparently so. How much wine goes in your coq au vin? No idea, you eyeball it because you're a pro. Or, you can't tell the difference between a Pinot Grigio and a Pinot Noir but the first recipe on Epicurious said to use 2 cups of the red one so that's what you're going to do. Which one is more "ad-hoc?"