I wouldn't consider non-parametric or robust as being sub-categories of statistics in the way that frequentist and Bayesian are, simply because there are both frequentist and Bayesian methods for non-parametric and robust statistics. Frequentist and Bayesian are genuine sub-categories as they are based on fundamentally different definitions of a probability. Frequentists and Bayesians will both vary the strength of assumptions made depending on the requirements of the application.
So I would say that particular subdivision into four categories is not widely recognised in statistics. In my opinion, both Bayesian and frequentist methods can be used for most statistical problems, however they are not always equally useful, for example whether a frequentist confidence interval or a Bayesian credible interval is more appropriate depends on whether you want to ask a question about what to expect if the experiment were replicated, or what we can conclude about the statistics as a result of the particular experiment that we have actually performed (I would suggest in most cases it is the latter, but scientists generally use frequentist methods anyway).
I would suggest the book Bayesian Data Analysis as a great source for answering this question (in particular chapter 6) and everything I am about to say. But one of the usual ways that Bayesians attack this problem is by using Posterior Predictive P-values (PPPs). Before I jump into how PPPs would solve this problem let me first define the following notation:
Let $y$ be the observed data and $\theta$ be the vector of parameters. We define $y^{\text{rep}}$ as the replicated data that could have been observed, or, to think predictively, as the data we would see tomorrow if the experiment that produced $y$ today were replicated with the same model and the same value of $\theta$ that produced the observed data.
Note, we will define the distribution of $y^{\text{rep}}$ given the current state of knowledge with the posterior predictive distribution
$$p(y^{\text{rep}}|y)=\int_\Theta p(y^{\text{rep}}|\theta)p(\theta|y)d\theta$$
Now, we can measure the discrepancy between the model and the data by defining test quantities, the aspects of the data we wish to check. A test quantity, or discrepancy measure, $T(y,\theta)$, is a scalar summary of parameters and data that is used as a standard when comparing data to predictive simulations. Test quantities play the role in Bayesian model checking that test statistics play in classical testing. We define the notation $T(y)$ for a test statistic, which is a test quantity that depends only on data; in the Bayesian context, we can generalize test statistics to allow dependence on the model parameters under their posterior distribution.
Classically, the p-value for the test statistic $T(y)$ is
$$p_C=\text{Pr}(T(y^{\text{rep}})\geq T(y)|\theta)$$
where the probability is taken over the distribution of $y^{\text{rep}}$ with $\theta$ fixed.
From a Bayesian perspective, lack of fit of the data with respect to the posterior predictive distribution can be measured by the tail-area probability, or p-value, of the test quantity, and computed using posterior simulations of $(\theta,y^{\text{rep}})$. In the Bayesian approach, test quantities can be functions of the unknown parameters as well as data because the test quantity is evaluated over draws from the posterior distribution of the unknown parameters.
Now, we can define the Bayesian p-value (PPPs) as the probability that the replicated data could be more extreme than the observed data, as measured by the test quantity:
$$p_B=\text{Pr}(T(y^{\text{rep}},\theta)\geq T(y,\theta)|y)$$
where the probability is taken over the posterior distribution of $\theta$ and the posterior predictive distribution of $y^{\text{rep}}$ (that is, the joint distribution, $p(\theta,y^{\text{rep}}|y)$):
$$p_B=\iint_\Theta I_{T(y^{\text{rep}},\theta)\geq T(y|\theta)}p(y^{\text{rep}}|\theta)p(\theta|y)dy^{\text{rep}}d\theta,$$
where $I$ is the indicator function. In practice though we usually compute the posterior predictive distribution using simulations.
If we already have, say, $L$ simulations from the posterior distribution of $\theta$, then we can just draw one $y^{\text{rep}}$ from the predictive distribution for each simulated $\theta$; we now have $L$ draws from the joint posterior distribution, $p(y^{\text{rep}},\theta|y)$. The posterior predictive check is the comparison between the realized test quantities $T(y,\theta^l)$ and the predictive test quantities $T(y^{\text{rep}l},\theta^l)$. The estimated p-value is just the proportion of these $L$ simulations for which the test quantity equals or exceeds its realized value; that is, for which $$T(y^{\text{rep}l},\theta^l)\geq T(y,\theta^l)$$ for $l=1,...,L$.
In contrast to the classical approach, Bayesian model checking does not require special methods to handle "nuisance parameters." By using posterior simulations, we implicitly average over all the parameters in the model.
An additional source, Andrew Gelman also has a very nice paper on PPP's here:
http://www.stat.columbia.edu/~gelman/research/unpublished/ppc_understand2.pdf
Best Answer
Think your statement through as a Frequentist and make it more specific first. A Frequentist could not say that "data set A is different from data set B", without any further clarification.
First, you'd have to state what you mean by "different". Perhaps you mean "have different mean values". Then again, you might mean "have different variances". Or perhaps something else?
Then, you'd have to state what kind of test you would use, which depends on what you believe are valid assumptions about the data. Do you assume that the data sets are both normally-distributed about some means? Or do you believe that they are both Beta-distributed? Or something else?
Now can you see that the second decision is much like the priors in Bayesian statistics? It's not just "my past experience", but is rather what I believe, and what I believe my peers will believe, are reasonable assumptions about my data. (And Bayesians can use uniform priors, which pushes things towards Frequentist calculations.)
EDIT: In response to your comment: the next step is contained in the first decision I mentioned. If you want to decide whether the means of two groups are different, you would look at the distribution of the difference of the means of the two groups to see if this distribution does or does not contain zero, at some level of confidence. Exactly how close to zero you count as zero and exactly which portion of the (posterior) distribution you use are determined by you and the level of confidence you desire.
A discussion of these ideas can be found in a paper by Kruschke, who also wrote a very readable book Doing Bayesian Data Analysis, which covers an example on pages 307-309, "Are Different Groups Equal?". (Second edition: p. 468-472.) He also has a blog posting on the subject, with some Q&A.
FURTHER EDIT: Your description of the Bayesian process is also not quite correct. Bayesians only care about what the data tells us, in light of what we knew independent of the data. (As Kruschke points out, the prior does not necessarily occur before the data. That's what the phrase implies, but it's really just our knowledge excluding some of the data.) What we knew independently of a particular set of data may be vague or specific and may be based on consensus, a model of the underlying data generation process, or may just be the results of another (not necessarily prior) experiment.