The word "sample" causes at least two different instances of confusion.
A (what the OP asks about)
The tag "Sample" here on CV starts by "A sample is a subset of a population": all possible elements included in any possible subset of a population can only be an event that is possible: hence the set of all possible events, can be called the "Sample Space" (the "Population Subsets Space"), because it is from that Space that the elements of any population subset can come.
Where does that leave us regarding the relation with the concept "outcomes"?
The population and its subsets do not consist of the numerical values that the elements of these subsets may take: these numerical values are assigned by the random variable that we have defined according to our needs.
To consider the trivial example, a series of coin-flips can be thought as a population of heads and tails. We define a real-valued random variable by, say, linking "Heads" with the number $5$ and "Tails" with the number "$17$". So the Sample Space will be "{Heads, Tails}", which will be the domain of the random variable, while the "outcome space", its range, will be $\{5,17\}$.
In other words, it is not necessary that "the function maps values to values" as the OP states. It can map anything to values.
And strictly speaking, a "sample" of, say size $3$ will be a set like "{Heads, Heads, Tails}", and not the set $\{5,5,17\}$. This latter set is produced by a specific random variable. Obviously, we could use another random variable and obtain a different numerical representation for the same sample.
In all, the Sample Space can be non-numerical while the "set of outcomes" of a real-valued random variable should be real-valued. To each realized sample from a population we can map infinitely many numerical sets.
It is by no accident that the latter are properly called "a sample of realizations of a random variable", and not just "a sample from a population".
Assume now that we have a coin where on the one side it reads "$1$" while on the other it reads "$2$". So the Sample Space here has a numerical nature. Still we can define a random variable by mapping $1$ to $5$ and $2$ to $17$. Here too, the Sample Space $\{1,2\}$ will be different than the "Outcome Space" $\{5,17\}$.
Our sample of size $3$ (understood as a subset of the population) will here be the set $\{1,1,2\}$, while the "sample of realizations of the (specific) random variable" will be $\{5,5,17\}$.
B: Sample and Observation
In fields like medicine or biology, when we say "let's take a sample of blood", we mean "let's take blood once". If we wanted to put this in general statistical terminology, we would have one observation... because in general statistical terminology a "sample" is a set containing usually more than one observation (although it can contain only one).
So when somebody from these fields will say "I have available $n$ samples" - he just might mean, in general terminology, "I have available $n$ observations" or "I have available one sample of $n$ observations" -but someone else that is used in the more standard terminology, by the expression "I have available $n$ samples", she will understand "I have available $n$ sets each containing $m$ observations" -and usually $m\geq 1$. One can find this sort of confused communication in various posts here on CV.
ADDENDUM
Responding to the OP's edit in the question:
"Why not sample real numbers right away"? Because the world is not made by numbers. Actual data collection that describes the world is in many cases of qualitative nature. So, "separating samples and outcomes" follows the nature of things. Moreover, the act of mapping them to numerical values is a separate step, and as I have already mentioned, it is not a unique mapping. So it requires decisions to be made. And whenever decisions are involved, they better be clear and transparent so that they can be judged, assessed, and criticized. These "decisions" are, to begin with, the choice of the random variable we will use.
"Heads and Tails" exist irrespective of whether we want to study them. The "random variable" is a mathematical concept/tool which we project onto the real-world data in order to analyze and study them. So, samples, they exist. Random variables, they transform samples into something that we can handle using quantitative methods.
As to whether "samples are deterministic", nobody has ever decisively argued of whether there exists anything inherently stochastic in nature, or whether all our stochastic approaches are just a reflection of our ignorance, and/or of the limits of our measuring devices.
Probability distribution is a mathematical function that describes a random variable. A little bit more precisely, it is a function that assigns probabilities to numbers and it's output has to agree with axioms of probability.
Statistical model is an abstract, idealized description of some phenomenon in mathematical terms using probability distributions. Quoting Wasserman (2013):
A statistical model $\mathfrak{F}$ is a set of distributions (or
densities or regression functions). A parametric model is a set
$\mathfrak{F}$ that can be parameterized by a finite number of
parameters. [...]
In general, a parametric model takes the form
$$ \mathfrak{F} = \{ f (x; \theta) : \theta \in \Theta \} $$
where $\theta$ is an unknown parameter (or vector of parameters) that
can take values in the parameter space $\Theta$. If $\theta$ is a
vector but we are only interested in one component of $\theta$, we
call the remaining parameters nuisance parameters. A nonparametric
model is a set $\mathfrak{F}$ that cannot be parameterized by a
finite number of parameters.
In many cases we use distributions as models (you can check this example). You can use binomial distribution as a model of counts of heads in series of coin throws. In such case we assume that this distribution describes, in simplified way, the actual outcomes. This does not mean that this is an only way on how you can describe such phenomenon, neither that binomial distribution is something that can be used only for this purpose. Model can use one or more distributions, while Bayesian models specify also prior distributions.
More formally this is discussed by McCullaugh (2002):
According to currently accepted theories [Cox and Hinkley (1974),
Chapter 1; Lehmann (1983), Chapter 1; Barndorff-Nielsen and Cox
(1994), Section 1.1; Bernardo and Smith (1994), Chapter 4] a
statistical model is a set of probability distributions on the sample
space $\mathcal{S}$. A parameterized statistical model is a parameter
$\Theta$ set together with a function $P : \Theta \rightarrow
\mathcal{P} (\mathcal{S})$, which assigns to each parameter point
$\mathcal{\theta \in \Theta}$ a probability distribution $P \theta$ on
$\mathcal{S}$. Here $\mathcal{P}(\mathcal{S})$ is the set of all
probability distributions on $\mathcal{S}$. In much of the following, it is
important to distinguish between the model as a function $ P : \Theta
\rightarrow \mathcal{P} (\mathcal{S}) $, and the associated set of
distributions $P\Theta \subset \mathcal{P} (\mathcal{S})$.
So statistical models use probability distributions to describe data in their terms. Parametric models are also described in terms of finite set of parameters.
This does not mean that all statistical methods need probability distributions. For example, linear regression is often described in terms of normality assumption, but in fact it is pretty robust to departures from normality and we need assumption about normality of errors for confidence intervals and hypothesis testing. So for regression to work we don't need such assumption, but to have fully specified statistical model we need to describe it in terms of random variables, so we need probability distributions. I write about this because you can often hear people saying that they used regression model for their data -- in most such cases they rather mean that they describe data in terms of linear relation between target values and predictors using some parameters, than insisting on conditional normality.
McCullagh, P. (2002). What is a statistical model? Annals of statistics, 1225-1267.
Wasserman, L. (2013). All of statistics: a concise course in statistical inference. Springer.
Best Answer
"Inverse probability" is a rather old-fashioned way of referring to Bayesian inference; when it's used nowadays it's usually as a nod to history. De Morgan (1838), An Essay on Probabilities, Ch. 3 "On Inverse Probabilities", explains it nicely:
An example follows using Bayes' Theorem.
I'm not sure that the term mightn't have at some point encompassed putative or proposed non-Bayesian, priorless, methods of getting from $f(y|\theta)$ to $p(\theta|y)$ (in @Christopher Hanck's notation); but at any rate Fisher was clearly distinguishing between "inverse probability" & his methods—maximum likelihood, fiducial inference—by the 1930's. It also strikes me that several early-20th-Century writers seem to view the use of what we now call uninformative/ignorance/reference priors as part & parcel of the "inverse probability" method†, or even of "Bayes' Theorem"‡.
† Fisher (1930), Math. Proc. Camb. Philos. Soc., 26, p 528, "Inverse probability", clearly distinguishes, perhaps for the first time, between Bayesian inference from flat "ignorance" priors ("the inverse argument proper"), the unexceptionable application of Bayes' Theorem when the prior describes aleatory probabilities ("not inverse probability strictly speaking"), & his fiducial argument.
‡ For example, Pearson (1907), Phil. Mag., p365, "On the influence of past experience on future expectation", conflates Bayes' Theorem with the "equal distribution of ignorance".