I am also unsure of my solution to this problem, so any corrections are welcome! I think you are on the right track, the isomorphism must be the identity map. The only thing that is problematic is showing that the sheaves of regular function are the same.
The question calls for us to view $U \cap Y$ in two ways. First, we view $U \cap Y$ as a closed subset of $U$. In this case, the sheaf $O_{U \cap Y}$ is by definition: For any $V$ open in $U \cap Y$, $O_{U \cap Y} (V) = \{ \varphi : V \to K :$ for all $a \in V$ there exists an open neighborhood $ N$ of $U$ containing $a$, and $\psi \in O_{U}(N)$ with $\varphi = \psi$ on $V \cap N \}$. Then, we note that $O_U(N) = O_X(N) |_U = O_X(N)$ where the second equality is because $N \subset U$.
On the other hand, we can view $U \cap Y$ as an open subset of $Y$. Then for any $V$ open subset of $U \cap Y$, we have that the sheaf is just the restriction: $O_{U \cap Y}(V) = O_{Y} (V)$. Lets apply the definition of $O_{Y} (V')$. By definition, for any $V'$ open in $Y$, $O_Y(V') = \{ \varphi : V' \to K : $ for all $a \in V'$ there exists an open neighborhood $N'$ of $X$ containing $a$, and $\psi$ in $O_X(N')$ with $\varphi = \psi$ on $V' \cap N' \}$.
Then just need to show that the restriction of the last set to $U \cap Y$ is equal to the first set. Since $V'$ is going to be an open subset of $U \cap Y$, we can replace "$V'$ open in $Y$" with "$V'$ open in $U \cap Y$".
$N'$ is an open subset of $X$, and $N' \subset U \cap Y$, so $N'$ is an open neighborhood of $U$ also. So we can replace "open neighborhood $N'$ of $X$" with "open neighborhood $N'$ of $U$". Then both sets can be seen to be equal.
As detailed in "What is an algebraic variety?", there are several different ways to define an algebraic variety, each more general than the last. Sheaves are very useful later on in algebraic geometry for solving a wide variety of problems, so it makes sense for us to introduce the concept with our early examples of varieties in order for us to get a handle on the concept.
It might help you to compare introducing the structure sheaf for affine algebraic varieties with how we talk about integrals. In calculus classes, we introduce integrals of nice functions like polynomials via some basic rules, and then once we get to analysis classes we introduce Riemann sums and Lebesgue integration so that we have access to the powerful theory: usually you'll have a couple examples or homework exercises proving your new theories recover all the same results as your old ones. The same sort of thing is going on here.
For question 2, one undesirable feature of considering what the author calls affine algebraic sets is that there are varieties $X$ with embeddings $f_1,f_2$ into affine space so that $f_1(X)$ is an affine algebraic set and $f_2(X)$ isn't. The most basic example is $k\setminus \{0\}$: it embeds in to $k$ in the obvious way, which isn't a variety since $k\setminus\{0\}\subset k$ isn't closed, but it also embeds in to $k^2$ via $x\mapsto (x,x^{-1})$, which is a variety (it's the zero locus of $xy-1$). Moving to a more intrinsic definition of variety avoids such issues.
Added in response to the comment asking for why we need the structue sheaf:
One motivational principle of algebraic geometry is that we can do a lot with a ring if we know it's the ring of functions on a manifold (or some other geometric object) - for instance, if the geometric object is disconnected, then we can decompose our ring as the direct product of the rings which are the functions on the connected components, or we can detect compactness of our manifold by finding a maximal ideal which doesn't come from evaluation at a point. It would be a natural question to try and apply this insight to a general ring: if you give me a ring $R$, what's the geometric object that this is the ring of functions on? The answer is the scheme $\operatorname{Spec} R$ (the spectrum of $R$).
When we study schemes, we have a lot of local data to carry around: we have regular functions on every open set, and we want an intrinsic way to package this all up. The structure sheaf gives us a way to do all of this efficiently - it keeps track of all the data for us. In some sense, most sheaves on varieties and schemes you want to discuss will be built (at least locally) out of copies of the structure sheaf - this is exactly the condition of being quasi-coherent. So studying sheaves means we need to know about the structure sheaf.
As for what problems you can solve via methods which become available after having access to sheaves, the big tool is sheaf cohomology. This does lots of wonderful things for us: most results involving cohomology from algebraic topology have versions in algebraic geometry where we use sheaf cohomology instead of singular/simplicial/cellular cohomology. For instance, as wacky as the Zariski topology might seem, we still get Serre duality (something that looks like Poincare duality) for smooth varieties over arbitrary fields, we have Kunneth formulas, we have characteristic classes, and we can count special curves or points (a very geometric thing to do!) via sheaf cohomology.
Best Answer
While the first option may appear as if it depends on the choices made, it actually does not. This is essentially the same process as upgrading from considering a manifold as some specified subset of $\Bbb R^n$ to consider a manifold as an abstract topological space.
Gathmann defines the structure sheaf on an affine variety $X\subset \Bbb A^n$ as follows: the sections over an open $U\subset X$ are functions $\varphi:U\to k$ such that for every $u\in U$ there are $f,g\in A(X)=k[t_1,\cdots,t_n]/I(X)$ with $g(u)\neq 0$ such that in an open neighborhood of $u$, we have $\varphi=\frac{f}{g}$. If $Y\subset X\subset \Bbb A^n$ is a closed subvariety, then we can use this definition to recover the second characterization: if $V\subset Y$ is an open subset and $\psi:V\to k$ is a function such that there are $f,g\in A(Y)$ so that for every $v\in V$ we have that $\psi=\frac{f}{g}$ on an open neighborhood, then replacing $f$ and $g$ with lifts $\overline{f},\overline{g}\in A(X)$ under the surjective ring homomorphism $A(X)\to A(Y)$, then near $v$, the function $\psi$ is the restriction of $\frac{\overline{f}}{\overline{g}}$ which is a perfectly good function on an open neighborhood of $v\in X$.
Conversely, suppose we take the second definition. Then a function $\psi:V\to K$ is regular at $v\in V$ iff there is an open neighborhood $U_v\subset X$ and a $\varphi\in\mathcal{O}_X(U_v)$ such that $\psi$ is the restriction of $\varphi$ near this point. By the definition of the structure sheaf on $X$, we may write $\varphi=\frac{f}{g}$ on $U'\subset U$, a smaller open neighborhood of $v\in X$. Then, letting $\overline{f},\overline{g}\in A(X)$ be the images of $f,g\in A(X)$ under the map $A(X)\to A(Y)$, we have $\frac{\overline{f}}{\overline{g}}=\psi$ on $U'\cap Y$, an open neighborhood of $v\in V$. So if we take the second definition of the structure sheaf on a subvariety, we recover the first definition.