I am going to replace some definitions by obviously equivalent ones. The benefit will be that the proofs are easier to understand. I am writing down very detailed proofs that you would not normally find in a college-level math textbook.
Definition: For $n \in \mathbb{N}$ define $[n] = \{k \in \mathbb{N} \mid k < n\}$.
Definition: The image of a map $f : X \to Y$ is the set $\mathrm{im}(f) = \{y \in Y \mid \exists x \in X \,.\, f(x) = y\}$.
Definition: A set $X$ is infinite when for every $n \in \mathbb{N}$ and every map $f : [n] \to X$ there is some $x \in X$ which is not in the image of $f$.
Lemma 1: An infinite set $X$ contains an element.
Proof. Because $X$ is infinite there is $x \in X$ which is not in the image of the unique map $[0] \to X$. QED.
Lemma 2: If $X$ is infinite then there exists a map $c$ such that $c(n,f) \in X \setminus \mathrm{im}(f)$ for every $n \in \mathbb{N}$ and every $f : [n] \to X$.
Proof. For every $n \in \mathbb{N}$ and $f : [n] \to X$ the set $X \setminus \mathrm{im}(f)$ contains an element because $X$ is infinite. By the axiom of choice there exists a choice map $c$ such that $c(n,f) \in X \setminus \mathrm{im}(f)$ for all $n$ and $f$. QED.
Here is an informal explanation of the map $c$. Given an infinite set $X$ and any elements $x_0, \ldots, x_{n-1} \in X$, we may view the tuple $(x_0, \ldots, x_n)$ as a map $f : [n] \to X$ where $f(k) = x_k$. By a slight abuse of notation we can then write $c(n,f)$ as $c(x_0, \ldots, x_{n-1})$. Now we see that $c$ accepts finitely many elements of $X$ and returns an element in $X$ which is different from all of them.
Proposition 1: If $X$ is infinite then there exists an injective map $i : \mathbb{N} \to X$.
Proof. By Lemma 1 there is $x \in X$ and let $c$ be the map from Lemma 2. Define $i$ by recursion:
\begin{align*}
i(0) &= x \\
i(n) &= c(i(0), \ldots, i(n-1)) \qquad\qquad \text{for $n \geq 1$}
\end{align*}
Clearly, $i$ is injective because $i(n)$ is chosen so that it is different from $i(0), i(1), \ldots, i(n-1)$. QED.
Proposition 2: If $X$ is infinite then it is in bijection with some proper subset $Y \subseteq X$.
Proof. Suppose $X$ is infinite. We seek a proper subset $Y \subseteq X$ and a bijection $f : X \to Y$. Let $i$ be the map from Proposition 1. Define
$$Y = X \setminus \{i(0)\}$$
and define $b : X \to Y$ by
$$b(x) = \begin{cases}
x & \text{if $x \not\in \mathrm{im}(i)$}\\
i(n+1) & \text{if $x = i(n)$ for a (unique) $n \in \mathbb{N}$}
\end{cases}
$$
in words, $b$ takes $i(0)$ to $i(1)$, $i(1)$ to $i(2)$, and so on, and does not move elements which are outside the image of $i$. Clearly, $b$ is injective because $i$ is. It is surjective because the only element not in the image of $b$ is $i(0)$. QED.
Proposition 3: If $X$ is a set and $b : X \to Y$ a bijection from $X$ to a proper subset $Y \subseteq X$, then there is an injective map $j : \mathbb{N} \to X$.
Proof.
Because $Y$ is a proper subset there exists $x \in X \setminus Y$. We define a map $j : \mathbb{N} \to X$ by recursion:
\begin{align*}
j(0) &= x \\
j(n) &= b(j(n-1)) \qquad\qquad\text{for $n \geq 1$}
\end{align*}
To prove that $j$ is injective, it suffices to verify that $j(m) \neq j(n)$ for all $m < n$. We do this by induction on $m$:
if $m = 0$ then $j(0) = x$ is different from $j(n) = b(j(n-1))$ because $b(j(n-1)) \in Y$ and $x \not\in Y$.
for the induction step, suppose we had $j(m) = j(n)$ for some $0 < m < n$. Then we would have $b(j(m-1)) = j(m) = j(n) = b(j(n-1))$ and because $b$ is a bijection it would follow that $j(m-1) = j(n-1)$, which is impossible by the induction hypothesis for $m-1$. Therefore it must be the case that $j(m) \neq j(n)$.
QED.
Theorem: A set is infinite if, and only if, it is equipotent with some proper subset.
Proof.
If $X$ is infinite then we apply Propostion 2 to get a proper subset $Y \subseteq X$ which is equipotent with $X$.
Conversely, suppose we have a bijection $b : X \to Y$ for a proper subset $Y \subseteq X$. By Proposition 3 there is an injective map $j : \mathbb{N} \to X$. To see that $X$ is infinite, consider any $n \in \mathbb{N}$ and $f : [n] \to X$. There are $n+1$ distinct elements $j(0), j(1), \ldots, j(n) \in X$, and so at least one of them is not in $\mathrm{im}(f)$ because $\mathrm{im}(f)$ has at most $n$ elements. QED.
Best Answer
Welcome to the wormhole.
A set which has a countably infinite subset is called Dedekind Infinite. It is not provable within Zermelo-Fraenkel set theory (ZF) that every infinite set is Dedekind infinite. This was demonstrated by Paul Cohen, who exhibited a model of ZF in which there is an infinite, Dedekind-finite set of real numbers. See Asaf's answer to my question for implications of this statement in analysis.
You do not need any form of the axiom of choice to prove that infinite equals Dedekind infinite, in the sense that one can construct models of ZF in which all infinite sets have countably infinite subsets but no form of the axiom of choice holds. However, if we do have some form of the axiom of choice available, even a weak form, then we can prove your statement.
Let me explain. The axiom of choice states that, given any collection $(A_i\;\colon\; i\in I)$ of nonempty sets, indexed by some set $I$, there exists a choice function picking out, for each $i\in I$, an element $a_i\in A_i$. In other words, we have a function $f\colon I\to\bigcup_iA_i$ such that $f(i)\in A_i$ for each $i$.
A well-known (but not easy to prove) consequence of the axiom of choice is the well-ordering theorem: given any set $A$, we can order the elements of $A$ in such a way that any nonempty subset of $A$ has a least element under the ordering. In fact, the well-ordering theorem is equivalent to the axiom of choice, since, given $(A_i\;\colon\;i\in I)$, we can define $f(i)$ to be the least element of $A_i$, under this ordering, giving us a choice function.
With this form of the axiom of choice, one can prove your statement as follows: given your set $S$, order it according to the well-ordering theorem. Then let $a_0$ be the least element of $S$, let $a_1$ be the least element of $S\setminus\{a_0\}$ and so on. This process will never terminate; otherwise, $S$ would be equal to $\{a_0,\dots,a_n\}$ for some $n$ and therefore would be finite.
That's all well and good, but the well-ordering theorem is a non-obvious theorem in set theory, and certainly isn't implicit in your argument. We shouldn't need to use it to prove this statement.
Thankfully, there is another way, and it actually involves a strictly weaker form of the axiom of choice. The axiom of countable choice is the same as the axiom of choice, except that we require the set $I$ to be countable. There are models of set theory in which the axiom of choice fails but still holds whenever the index set is countable, so countable choice is strictly weaker than full choice.
Here's how we prove your statement using only countable choice. Let $S$ be an infinite set. Then, for each $n=0,1,2,\dots$, let $S_n$ be the set of subsets of $S$ of size $2^n$.
We can show that each $S_n$ is non-empty using your argument (note that the axiom of choice is not required for a finite collection of sets, since we can proceed by induction): inductively pick elements $a_1,\dots,a_{2^n}$ from $S$; since $S$ is infinite, this process will not terminate and since we are only choosing finitely many elements, there are no dodgy set-theory issues.
Now, there are countably many $S_n$, so the axiom of countable choice tells us that there is a sequence $A_0,A_1,A_2,\dots$ of subsets of $S$ such that $A_n$ has size $2^n$ for each $n$.
Now, for each $n$, let $B_n$ be the set of elements of $A_n$ that are not contained in $A_0\cup\dots\cup A_{n-1}$. Since $A_n$ has size $2^n$, and $A_0,\dots,A_{n-1}$ have at most $1+2+4+\dots+2^{n-1}=2^n-1$ distinct elements between them, $B_n$ must be non-empty. By the axiom of countable choice again, we can choose a sequence $s_0,s_1,\dots$ with each $s_n\in B_n$. In particular, the $s_n$ are distinct. We have found a countably infinite subset of $S$.