You are correct. The "crude" criterion for recurrence of a 2d random walk is $\mu=0$ and $\sigma^2<\infty$ for the jump distribution. The jump sizes are otherwise unrestricted.
The "detailed" criterion involves the characteristic function $\phi$ of the jump distribution, i.e., its Fourier transform. It says that 2d random walk is transient or recurrent as the real part of $(1-\phi(\theta))^{-1}$ is Lebesgue integrable on a neighborhood of the origin.
These results are from Section 8 of Spitzer's Principles of Random Walk (2e).
Spitzer gives a detailed example of symmetric one-dimensional random walks, and shows that their recurrence or transience depends on the size of the tail of the jump distribution. That is, he supposes that
$$0<\lim_{|x|\to\infty} |x|^{1+\alpha}P(0,x)=c<\infty,$$ and concludes that this walk is recurrent when $\alpha\geq 1$ and transient when $\alpha<1$.
So, somewhat unexpectedly, there exist symmetric transient random walks in one dimension. Their jump distribution has such large tails that the walk leaps back and forth with large jumps and satisfies $\liminf_n X_n=-\infty$ and $\limsup_n X_n=+\infty$ without a guarantee of returning to the origin.
It should be possible to modify his arguments to the two dimensional case.
By "higher dimensonal class field theory", Tate means the class field theory of higher dimensional local fields (see also this brief discussion), developed in the work of various people, including Kato and Parshin.
As for your second question, about learning CFT from a modern perspective: with my own students, I encourage them to learn from Cassels and Frolich (including the exercises), from Cox's book Primes of the form $x^2 + n y^2$, and from Washington's article on Galois cohomology in Cornell, Silverman, and Stevens.
The first reference (especially the main articles of Serre and Tate) gives a development of the main results of CFT which I think is hard to beat. Cox's book gives an important classical perspective. Washington's article gives insight into how class field theory can be reformulated as a collection of theorems (mainly due to Tate) on (local and global) Galois cohomology.
Tate's article Number theoretic background in the second volume of Corvalis is good when you have reached a certain level of sophistication, and are ready to move on from just focussing on algebraic number theory and CFT to a broader perspective. My experience is that it is a little austere for a beginner, though.
One thing that you will be missing if you follow the above references is an $L$-function-based perspective on class field theory. I gather that this is discussed in the new edition of Artin--Tate. If so, it is worth learning, since although it is the more old-fashioned point of view on CFT, non-abelian class field theory (i.e. the Langlands program) is founded on the notion of $L$-functions. (I believe that Lang's book also discusses the $L$-function approach to CFT, but I've never read it myself.)
Best Answer
In an arbitrary dimension d:
Let $\vec{R}$ be the end-to-end distance vector of a random walk of fixed step length $|\vec{r}_i| = l$. $\vec{R}$ can then be expressed as $\displaystyle \vec{R} = \sum_{i=1}^N \vec{r}_i$, where $\vec{r}_i$ is the vector of the $i$-th step. The Root-Mean-Square End-to-End Distance is given by $\textrm{RMS}=\sqrt { \langle R^2 \rangle }$. Since the steps are mutually independent, the covariance of two steps $\vec{r}_i$ and $\vec{r}_j$ is zero if $i\neq j$ and $\textrm{Cov}(\vec{r}_i, \ \vec{r}_j)= \textrm{Var}(\vec{r}_i)$ if $i=j$. The variance of $ \vec{r}_i$ can be expressed as $ \textrm{Var}(\vec{r}_i)= \langle \vec{r}_i \cdot \vec{r}_i \rangle - \langle \vec{r}_i \rangle^2$. Due to symmetry $\langle \vec{r}_i \rangle=\vec{0}$ and therefore the variance of of $ \vec{r}_i$ is simply $ \textrm{Var}(\vec{r}_i)= \langle \vec{r}_i \cdot \vec{r}_i \rangle = |\vec{r}_i|^2 = l^2$. Altogether, the covariance of $\vec{r}_i$ and $\vec{r}_j$ equals $\textrm{Cov}(\vec{r}_i, \ \vec{r}_j)=\delta_{ij}l^2$. The covariance of $\vec{r}_i$ and $\vec{r}_j$ can also be expressed as $\textrm{Cov}(\vec{r}_i, \ \vec{r}_j) = \langle \vec{r}_i \cdot \vec{r}_j \rangle - \langle \vec{r}_i \rangle \cdot \langle \vec{r}_j \rangle$. Combining the two different expressions for the covariance and using that $\langle \vec{r}_i \rangle=0$, results in $\langle \vec{r}_i \cdot \vec{r}_j \rangle =\delta_{ij}l^2$. This result can be used to determine the RMS:
$$\textrm{RMS}=\sqrt { \langle R^2 \rangle } = \sqrt { \langle \vec{R} \cdot \vec{R} \rangle } =\sqrt { \big\langle \sum_{i=1}^N \vec{r}_i \cdot \sum_{j=1}^N \vec{r}_j \big\rangle } =\sqrt { \sum_{i=1}^N \sum_{j=1}^N \langle \vec{r}_i \cdot \vec{r}_j \rangle }= $$ $$=\sqrt { \sum_{i=1}^N \sum_{j=1}^N l^2 \delta_{ij} + 0^2}=\sqrt { \sum_{i=1}^N l^2}=\sqrt { N l^2}=l\sqrt { N }$$
Let $Z_i$ denote the $i$-th coordinate of the end-to-end distance vector $\vec{R}$ after $N$ steps, and let $X_i$ and $Y_i$ denote the number of steps taken in the $i$-th dimension in the positive and negative direction respectively. Then the set of random variables $\{X_i, Y_i\}_{i=1}^d$ follows a multinomial distribution with parameters $N$ and $\displaystyle p_i=\frac{N}{2d}$. For sufficiently large values of $N$, $\{X_i, Y_i\}_{i=1}^d$ are approximately iid (independent and identically distributed) Poisson random variables with parameters $\displaystyle \lambda_i = \frac{N}{2d}$. For $\lambda > 20$, i.e. $N>40d$, $\textrm{Po}(\lambda) \sim \textrm{N}(\lambda, \lambda)$. $ Z_i = l(X_i - Y_i)$ and therefore $\displaystyle Z_i \sim \textrm{N}(l(\lambda - \lambda), l^2(\lambda+\lambda))=\textrm{N}(0, 2l\lambda)=\textrm{N}\left(0, \frac{l^2N}{d}\right)$.
$\displaystyle \langle R \rangle = \langle \sqrt{R^2} \rangle = \left\langle \sqrt{ \sum_{i=1}^d Z_i^2} \right\rangle$. The square root of a sum of $k$ independent $\textrm{N}(0, 1)$-distributed random variables is distributed according to the chi distribution, $\chi_k$. Therefore $\displaystyle \sqrt{ \sum_{i=1}^d \frac{dZ_i^2}{l^2N}}$ is approximately $\chi_d$-distributed for large values of $N$. The expected value of a $\chi_k$-distributed random variable is $\displaystyle \sqrt{2} \frac{ \Gamma \left(\frac{k+1}{2}\right) }{\Gamma \left( \frac{k}{2}\right)}$.
Hence $\displaystyle \langle R \rangle =\left\langle\sqrt{ \sum_{i=1}^d Z_i^2}\right\rangle =\left\langle l \sqrt{\frac{N}{d}} \sqrt{ \sum_{i=1}^d \frac{dZ_i^2}{l^2N} }\right\rangle = l \sqrt{\frac{2N}{d} }\frac{ \Gamma \left(\frac{d+1}{2}\right) }{\Gamma \left( \frac{d}{2}\right)}$.