You don't say what the other statistics book is, but I'd guess that it is a
book (or section) about finite population sampling.
When you sample random variables, i.e. when you consider a set
$X_1,\dots,X_n$ of $n$ random variables, you know that if they are
independent, $f(x_1,\dots,x_n)=f(x_1)\cdots f(x_n)$, and identically distributed, in particular $E(X_i)=\mu$ and $\text{Var}(X_i)=\sigma^2$ for all $i$, then:
$$\overline{X}=\frac{\sum_i X_i}{n},\quad E(\overline{X})=\mu,\quad
\text{Var}(\overline{X})=\frac{\sigma^2}{n}$$
where $\sigma^2$ is the second central moment.
Sampling a finite population is somewhat different. If the population is of
size $N$, in sampling without replacement there are $\binom{N}{n}$ possible
samples $s_i$ of size $n$ and they are equiprobable:
$$p(s_i)=\frac{1}{\binom{N}{n}}\quad\forall i=1,\dots,\binom{N}{n}$$
For example, if $N=5$ and $n=3$, the sample space is $\{s_1,\dots,s_{10}\}$
and the possibile samples are:
$$\begin{gather}s_1=\{1,2,3\},s_2=\{1,2,4\},s_3=\{1,2,5\},s_4=\{1,3,4\},s_5=\{1,3,5\},\\
s_6=\{1,4,5\},s_7=\{2,3,4\},s_8=\{2,3,5\},s_9=\{2,4,5\},s_{10}=\{3,4,5\}\end{gather}$$
If you count the number of occurences of each individual, you can see that
they are six, i.e. each individual has an equal chanche of being selected (6/10). So each $s_i$ is a random sample according to the second definition. Roughly, it is not an i.i.d. random sample because individuals
are not random variables: you can consistently estimate $E[X]$ by a sample mean but will
never know its exact value, but you can know the exact population mean if $n=N$ (let me repeat: roughly.)${}^1$
Let $\mu$ be some polulation mean (mean height, mean income, ...). When $n<N$
you can estimate $\mu$ like in random variable sampling:
$$\overline{y}_s=\sum_{i=1}^n y_i,\quad E(\overline{y}_s)=\mu$$
but the sample mean variance is different:
$$\text{Var}(\overline{y}_s)=\frac{\tilde\sigma^2}{n}\left(1-\frac{n}{N}\right)$$
where $\tilde\sigma^2$ is the population quasi-variance:
$\frac{\sum_{i=1}^N(y_i-\overline{y})^2}{N-1}$.
Factor $(1-n/N)$ is usally called "finite population correction factor".
This is a quick example of how a (random variable) i.i.d. random sample and a
(finite population) random sample may differ. Statistical
inference is mainly about
random variable sampling, sampling
theory is about finite
population sampling.
${}^1$ Say you are manufacturing light bulbs and wish to know their average life
span. Your "population" is just a theoretical or virtual one, at least if you
keep manufacturing light bulbs. So you have to model a data generation
process and intepret a set of light bulbs as a (random variable) sample. Say
now that you find a box of 1000 light bulbs and wish to know their average
life span. You can select a small set of light bulbs (a finite population
sample), but you could select all of them. If you select a small sample, this
doesn't transform light bulbs into random variables: the random variable is
generated by you, as the choice between "all" and "a small set" is up to
you. However, when a finite population is very large (say your country
population), when choosing "all" is not viable, the second situation is better
handled as the first one.
The word "sample" causes at least two different instances of confusion.
A (what the OP asks about)
The tag "Sample" here on CV starts by "A sample is a subset of a population": all possible elements included in any possible subset of a population can only be an event that is possible: hence the set of all possible events, can be called the "Sample Space" (the "Population Subsets Space"), because it is from that Space that the elements of any population subset can come.
Where does that leave us regarding the relation with the concept "outcomes"?
The population and its subsets do not consist of the numerical values that the elements of these subsets may take: these numerical values are assigned by the random variable that we have defined according to our needs.
To consider the trivial example, a series of coin-flips can be thought as a population of heads and tails. We define a real-valued random variable by, say, linking "Heads" with the number $5$ and "Tails" with the number "$17$". So the Sample Space will be "{Heads, Tails}", which will be the domain of the random variable, while the "outcome space", its range, will be $\{5,17\}$.
In other words, it is not necessary that "the function maps values to values" as the OP states. It can map anything to values.
And strictly speaking, a "sample" of, say size $3$ will be a set like "{Heads, Heads, Tails}", and not the set $\{5,5,17\}$. This latter set is produced by a specific random variable. Obviously, we could use another random variable and obtain a different numerical representation for the same sample.
In all, the Sample Space can be non-numerical while the "set of outcomes" of a real-valued random variable should be real-valued. To each realized sample from a population we can map infinitely many numerical sets.
It is by no accident that the latter are properly called "a sample of realizations of a random variable", and not just "a sample from a population".
Assume now that we have a coin where on the one side it reads "$1$" while on the other it reads "$2$". So the Sample Space here has a numerical nature. Still we can define a random variable by mapping $1$ to $5$ and $2$ to $17$. Here too, the Sample Space $\{1,2\}$ will be different than the "Outcome Space" $\{5,17\}$.
Our sample of size $3$ (understood as a subset of the population) will here be the set $\{1,1,2\}$, while the "sample of realizations of the (specific) random variable" will be $\{5,5,17\}$.
B: Sample and Observation
In fields like medicine or biology, when we say "let's take a sample of blood", we mean "let's take blood once". If we wanted to put this in general statistical terminology, we would have one observation... because in general statistical terminology a "sample" is a set containing usually more than one observation (although it can contain only one).
So when somebody from these fields will say "I have available $n$ samples" - he just might mean, in general terminology, "I have available $n$ observations" or "I have available one sample of $n$ observations" -but someone else that is used in the more standard terminology, by the expression "I have available $n$ samples", she will understand "I have available $n$ sets each containing $m$ observations" -and usually $m\geq 1$. One can find this sort of confused communication in various posts here on CV.
ADDENDUM
Responding to the OP's edit in the question:
"Why not sample real numbers right away"? Because the world is not made by numbers. Actual data collection that describes the world is in many cases of qualitative nature. So, "separating samples and outcomes" follows the nature of things. Moreover, the act of mapping them to numerical values is a separate step, and as I have already mentioned, it is not a unique mapping. So it requires decisions to be made. And whenever decisions are involved, they better be clear and transparent so that they can be judged, assessed, and criticized. These "decisions" are, to begin with, the choice of the random variable we will use.
"Heads and Tails" exist irrespective of whether we want to study them. The "random variable" is a mathematical concept/tool which we project onto the real-world data in order to analyze and study them. So, samples, they exist. Random variables, they transform samples into something that we can handle using quantitative methods.
As to whether "samples are deterministic", nobody has ever decisively argued of whether there exists anything inherently stochastic in nature, or whether all our stochastic approaches are just a reflection of our ignorance, and/or of the limits of our measuring devices.
Best Answer
A random variable, $X:\Omega \rightarrow \mathbb R$, is a function from the sample space to the real line. This is a deterministic formula that can be as simple as writing down the number a die lands on in the random experiment of tossing a die. The experiment is random, in the way that we don't control many of the physical factors determining its outcome; however, as soon as the die lands the random variable maps the outcome in the physical world to a number.
Other examples would include measuring the height of a sample of eight graders, perhaps to infer the population parameters (including mean and variance). Each boy or girl would be the outcome of a random experiment, pretty much like tossing a coin. Once a subject is selected, the actual mapping to a number in inches or centimeters is not subject to randomness, despite its name of "random variable."
A group of such experiments would constitute a sample: "In statistics, a simple random sample is a subset of individuals (a sample) chosen from a larger set (a population)." This definition is intuitive, but leaves the term population implicit. An attempt at fixing this gap is made in this paper, pointing out that 'the term “population” as a noun should refer to the sample space, not the random variable as is the case in many textbooks."
A random sample is a collection of $n$ independent and identically distributed (i.i.d.) random variables $X_1, X_2, X_3,\dots, X_n.$ in which ${\displaystyle X_{i}}$ is the function $X(\cdot)$ applied to the outcome of the $i$-th experiment: ${\displaystyle x_{i}=X_{i}(\omega )}.$ Although sampling without replacement doesn't fulfill the independence requirement, this point is overlooked when sampling from a large population in favor of computational expediency.
The $n$-tuples $x_1,x_2,x_3,\dots,x_n$ are particular realizations of the random variables, which in the case proposed in the question, would be drawn from $N(\mu,\sigma^2)$ identically distributed $X_i$ random variables. So in the OP the process of "drawing some samples" would result in individual realizations of this collection of random variables.
Random variables are the object of mathematical laws, such as the LLN or the CLT. The distribution of the random variable will dictate the feasibility of induction from random samples. For example, any given realizations will always have a mean and a standard deviation as an $n$-tuple or real numbers, yet their generating random variables may not have finite moments, e.g. Pareto, compromising statistical inference about the population characteristics.