I am also extremely annoyed when people say things like this. Until you describe what a non-affine variety is, statements like so-and-so is not an affine variety are meaningless: a priori being an affine variety is extra structure on a set, not a property. If being affine is a property if some larger class of objects, you first need to describe what that larger class of objects is!
Let me restrict my attention to $\mathbb{A}^1 \setminus \{ 0 \}$. What sort of object is this? One answer is that it can be usefully thought of as a "Zariski sheaf," namely a functor
$$\mathbb{G}_m : \text{CRing} \ni R \mapsto R^{\times} \in \text{Set}$$
from commutative rings to sets satisfying a sheaf condition. (Note that this does something a bit more general than just "remove the origin" on commutative rings that aren't fields: it removes all non-invertible elements.) The sense in which $\mathbb{G}_m$ is an affine variety, even though it is not a closed subvariety of $\mathbb{A}^1$, is that this sheaf is representable by a commutative ring, namely $\mathbb{Z}[x, x^{-1}]$.
(A more naive guess is $R \mapsto R \setminus \{ 0 \}$, but this isn't even a functor.)
Other answers are possible at various levels of sophistication; you can look up the terms "quasi-affine" and "quasi-projective" variety.
As detailed in "What is an algebraic variety?", there are several different ways to define an algebraic variety, each more general than the last. Sheaves are very useful later on in algebraic geometry for solving a wide variety of problems, so it makes sense for us to introduce the concept with our early examples of varieties in order for us to get a handle on the concept.
It might help you to compare introducing the structure sheaf for affine algebraic varieties with how we talk about integrals. In calculus classes, we introduce integrals of nice functions like polynomials via some basic rules, and then once we get to analysis classes we introduce Riemann sums and Lebesgue integration so that we have access to the powerful theory: usually you'll have a couple examples or homework exercises proving your new theories recover all the same results as your old ones. The same sort of thing is going on here.
For question 2, one undesirable feature of considering what the author calls affine algebraic sets is that there are varieties $X$ with embeddings $f_1,f_2$ into affine space so that $f_1(X)$ is an affine algebraic set and $f_2(X)$ isn't. The most basic example is $k\setminus \{0\}$: it embeds in to $k$ in the obvious way, which isn't a variety since $k\setminus\{0\}\subset k$ isn't closed, but it also embeds in to $k^2$ via $x\mapsto (x,x^{-1})$, which is a variety (it's the zero locus of $xy-1$). Moving to a more intrinsic definition of variety avoids such issues.
Added in response to the comment asking for why we need the structue sheaf:
One motivational principle of algebraic geometry is that we can do a lot with a ring if we know it's the ring of functions on a manifold (or some other geometric object) - for instance, if the geometric object is disconnected, then we can decompose our ring as the direct product of the rings which are the functions on the connected components, or we can detect compactness of our manifold by finding a maximal ideal which doesn't come from evaluation at a point. It would be a natural question to try and apply this insight to a general ring: if you give me a ring $R$, what's the geometric object that this is the ring of functions on? The answer is the scheme $\operatorname{Spec} R$ (the spectrum of $R$).
When we study schemes, we have a lot of local data to carry around: we have regular functions on every open set, and we want an intrinsic way to package this all up. The structure sheaf gives us a way to do all of this efficiently - it keeps track of all the data for us. In some sense, most sheaves on varieties and schemes you want to discuss will be built (at least locally) out of copies of the structure sheaf - this is exactly the condition of being quasi-coherent. So studying sheaves means we need to know about the structure sheaf.
As for what problems you can solve via methods which become available after having access to sheaves, the big tool is sheaf cohomology. This does lots of wonderful things for us: most results involving cohomology from algebraic topology have versions in algebraic geometry where we use sheaf cohomology instead of singular/simplicial/cellular cohomology. For instance, as wacky as the Zariski topology might seem, we still get Serre duality (something that looks like Poincare duality) for smooth varieties over arbitrary fields, we have Kunneth formulas, we have characteristic classes, and we can count special curves or points (a very geometric thing to do!) via sheaf cohomology.
Best Answer
To the pair. If it referred to only $𝑋$, it would be only a topological isomorphism, i.e., a homeomorphism. That's too weak. You want the composition of any regular function with the map to be a regular function.
$(𝑋,\mathcal{O}_𝑋)$ and $(𝑌,\mathcal{O}_𝑌)$ are isomorphic, if there exists a homeomorphism $F: X \rightarrow Y$ such that $F^*: \mathcal{O}_Y \rightarrow \mathcal{O}_X$ is an isomorphism of sheaves. This is a fancy way of saying that the composition of any locally regular function on $Y$ with $F$ is locally regular on $X$.
Yes.
Presumably, this is answered in the book. It goes something like but probably not exactly like this: The isomorphism defines a map from $X$ to $k^n$. Such a map consists simply of $n$ global regular functions, i.e., global sections of $\mathcal{O}_X$. The fact that the map is an isomorphism onto its image implies that these $n$ functions generate the sheaf $\mathcal{O}_X$. Conversely, any $n$ global sections of $\mathcal{O}_X$ define a map from $X$ to $k^n$. If these functions generate the sheaf $\mathcal{O}_X$, then the map is an embedding and the image is necessarily an affine subvariety $k^n$.