$W^{s,p}$ is the inhomogeneous Triebel-Lizorkin space $F_{p,q}^{s}(\mathbb{R}^{n})$, with $q=2$, defined by
$$\|f\|_{F_{p,q}^{s}}=\left\|\left(\sum_{k}2^{kqs}|P_{k}f|^{q}\right)^{1/q}\right\|_{L^{p}}$$
As you point out, one obtains the Besov space $B_{p,q}^{s}$ simply by interchanging the order in which norms are taken. I think you would agree that interchanging norms is, in general, a nontrivial action. It is clear from Minkowski's integral inequality that
$$\|f\|_{F_{p,q}^{s}}\leq\|f\|_{B_{p,q}^{s}} \enspace p\geq q, \quad\|f\|_{B_{p,q}^{s}}\leq\|f\|_{F_{p,q}^{s}} \enspace q\geq p$$
Additionally, by the nesting property of sequence spaces,
$$\|f\|_{F_{p,q}^{s}}\leq\|f\|_{B_{p,r}^{s}} \enspace q\geq r, \quad \|f\|_{B_{p,r}^{s}}\leq\|f\|_{F_{p,q}^{s}} \enspace r\geq q$$
I believe that an equivalent characterization of $B_{p,q}^{s}$, for $0<s<1$, is in terms of the norm
$$\|f\|_{L^{p}}+\left(\int_{\mathbb{R}^{n}}\dfrac{(\|f(x+t)-f(x)\|_{L^{p}})^{q}}{|t|^{n+s}}dt\right)^{1/q}$$
If this is correct, then Besov spaces correspond to the generalized Lipschitz spaces $\Lambda_{\alpha}^{p,q}$ in E.M. Stein, Singular Integrals and Differentiability Properties of Functions, Chapter 5, where $s=\alpha$ in our notation. Furthermore, one can show using this characterization that
$$W^{s,p}(\mathbb{R})\not\subset B_{p,q}^{s}(\mathbb{R}), \quad q<2 \tag{1}$$
and
$$B_{p,q}^{s}(\mathbb{R})\not\subset W^{s,p}(\mathbb{R}), \quad q>2 \tag{2}$$
According to section 6.8 of the aforementioned reference, the function
$$f_{s,\sigma}(x):=e^{-\pi x^{2}}\sum_{k=1}^{\infty}a^{-ks}k^{-\sigma}e^{2\pi i a^{k}x}, \quad x\in\mathbb{R}$$
where $a>1$ is an integer, satisfy
$$f_{s,\sigma}\in W^{s,p}(\mathbb{R})\Leftrightarrow \sigma>\dfrac{1}{2},\quad \forall 1<p<\infty$$
and
$$f_{s,\sigma}\in B_{p,q}^{s}(\mathbb{R})\Leftrightarrow \sigma>\dfrac{1}{q},\quad\forall 1<p<\infty$$
From this result, which I imagine depends on results for lacunary Fourier series, it is easy to deduce (1) and (2).
Best Answer
Generally, a distribution $u \in \mathcal{D}'(\mathbb{R}^n)$ is a linear functional on $C_c^\infty(\mathbb{R}^n)$, the space of compactly supported $C^\infty$ functions.
A function $f \in L^1_{\text{loc}}(\mathbb{R}^n)$ defines a distribution by $\varphi \mapsto \int f(x) \, \varphi(x) \, dx.$ We can therefore identify $L^1_{\text{loc}}(\mathbb{R}^n)$ as a subspace of $\mathcal{D}'(\mathbb{R}^n)$. If $u \in \mathcal{D}'(\mathbb{R}^n)$ and there exists $f \in L^1_{\text{loc}}(\mathbb{R}^n)$ such that $\langle u, \varphi \rangle = \int f(x) \, \varphi(x) \, dx$ then we abuse notation and write $u \in L^1_{\text{loc}}(\mathbb{R}^n)$.
That's what happens here. There exists $f \in L^2_{\text{loc}}(\mathbb{R}^n)$ such that $\langle \mathcal{F}u, \varphi \rangle = \int f(x) \, \varphi(x) \, dx.$ This also answers your second question: $\mathcal{F}u(\xi) := f(\xi).$ Actually, not even this is well-defined since $L^2_{\text{loc}}(\mathbb{R}^n)$ consists of equivalence classes of functions that are equal modulo a null-set. But we can just take any representative; they all work the same when used in integrals.