Think your statement through as a Frequentist and make it more specific first. A Frequentist could not say that "data set A is different from data set B", without any further clarification.
First, you'd have to state what you mean by "different". Perhaps you mean "have different mean values". Then again, you might mean "have different variances". Or perhaps something else?
Then, you'd have to state what kind of test you would use, which depends on what you believe are valid assumptions about the data. Do you assume that the data sets are both normally-distributed about some means? Or do you believe that they are both Beta-distributed? Or something else?
Now can you see that the second decision is much like the priors in Bayesian statistics? It's not just "my past experience", but is rather what I believe, and what I believe my peers will believe, are reasonable assumptions about my data. (And Bayesians can use uniform priors, which pushes things towards Frequentist calculations.)
EDIT: In response to your comment: the next step is contained in the first decision I mentioned. If you want to decide whether the means of two groups are different, you would look at the distribution of the difference of the means of the two groups to see if this distribution does or does not contain zero, at some level of confidence. Exactly how close to zero you count as zero and exactly which portion of the (posterior) distribution you use are determined by you and the level of confidence you desire.
A discussion of these ideas can be found in a paper by Kruschke, who also wrote a very readable book Doing Bayesian Data Analysis, which covers an example on pages 307-309, "Are Different Groups Equal?". (Second edition: p. 468-472.) He also has a blog posting on the subject, with some Q&A.
FURTHER EDIT: Your description of the Bayesian process is also not quite correct. Bayesians only care about what the data tells us, in light of what we knew independent of the data. (As Kruschke points out, the prior does not necessarily occur before the data. That's what the phrase implies, but it's really just our knowledge excluding some of the data.) What we knew independently of a particular set of data may be vague or specific and may be based on consensus, a model of the underlying data generation process, or may just be the results of another (not necessarily prior) experiment.
Some choices include Weibull, Gamma (including exponential), and lognormal distributions, possibly with a shift-parameter if there's a non-zero minimum possible time. (However from your diagram it looks like there's also potentially a discreteness issue.)
If the presentation drawing is reasonably accurate, a shift-parameter will probably be required.
If there's a tendency for the times to be highly skew, log-logistic, inverse Gaussian or Pareto might be considered. (It doesn't look to be the case here though.)
Best Answer
It is in the nature of statistical tests that non-rejection of a null hypothesis does not mean that the null hypothesis is true. This particularly means that not rejecting a model assumption by the KS (or any) test does not make sure that the model is true. If you test several models, you may well not reject several of them.
In fact whether model assumptions should be tested before using a model is controversial and to what extent this is useful depends on the specific situation. We have a paper that discusses the issue in some depth: https://arxiv.org/abs/1908.02218
One reason against testing model assumptions is what I call "misspecification paradox", namely that conditionally on not rejecting a model assumption, data violate the model assumption, even if they followed the model before misspecification/goodness-of-fit testing, see Most interesting statistical paradoxes.
On the other hand, it is a misconception that model assumptions need to be fulfilled in order to apply a model-based method. In fact many model-based methods work quite well also in situations in which the model is violated, although this depends on what exactly you do and how the model is violated, see the paper linked above. In many situations, the misspecification paradox mentioned above, even though technically violating the model assumption, doesn't affect a method's performance a lot.
In fact nonparametric methods are not a magic bullet in the sense that there are situations in which a model-based method can do better than a nonparametric one even if the nominal model is violated. An example regarding comparing the two-sample t-test with a nonparametric Wilcoxon test is also given in the paper. This of course depends on what your aim is. If you have lots of data and prediction quality is your primary aim, a parametric method will have a very hard time to beat a good nonparametric one. However for decision making you may want to summarise the data using statistics such as the mean or regression estimators that can be interpreted. Nonparametric methods in such situations may not give you what you want. Parametric modelling may also serve to think in a clearer and better way about what goes on, and more sophisticated models such as mixed effects/multilevel or time series models can incorporate detailed information about how the data were collected and what kind of structure there is regarding how they depend on each other.
Regarding removing outliers, I'd only remove outliers if there is strong evidence, usually from background knowledge, that these observations are indeed erroneous. Your data are information; removing outliers means removing potentially meaningful and important information. Just being outlying alone is not enough of a reason for removal (unless the value is actually impossible). Many (but not all) parametric methods can be badly affected by outliers, but there is so-called robust statistic that still estimates parameters but in ways less or not affected by outliers.