By $H^{(m)}$ the authors denote the usual conserved charge, i.e. the $m$th logarithmic derivative of the transfer matrix at a suitable value of the spectral parameter. For example, for $m=1$ this gives the XXZ spin chain Hamiltonian: a sum of terms $h^{(1)}_j$ that each act at blocks of at most one site adjacent to site $j$, i.e. at nearest neighbours $j,j+1$. This is true for any system size $N$.
If you similarly compute the next ultralocal charge $H^{(2)}$ you will get a sum of terms that act at sites $j,j+1,j+2$. (You might want do this explicitly as an exercise. The result can be found e.g. in my lecture notes on the arXiv.) Again this form is the same for any $N$.
In general, for fixed $m$ the summands act nontrivially at at most $m+1$ adjacent sites, independently of $N$. There's nothing extensive about these summands when you fix $m$.
The only thing that changes with $N$ is how many summands there are (the sum runs up to $N$), and how many such charges you have. Indeed, up to a possible common factor (depending on the normalisation of the R-matrix) the entries of the transfer matrix are polynomials in the spectral parameter of degree at most $N$, so you can in principle compute nontrivial charges for $0\leq m\leq N$. (Here $m=0$ corresponds to the momentum operator, i.e. the log of the translation = cyclic shift operator.)
This is a hard question, and is closely related to what you precisely mean by (quantum) integrability. The definition that you use is sometimes called Yang-Baxter integrability: those models for which there is an underlying $R$-matrix obeying the Yang-Baxter equations - and a few other conditions, like a unitarity (aka inversion) relation $$R(x)\,R(-x) \propto 1 \, ,$$ and some sort of initial (aka regularity) condition that depends on normalisation $$ \text{e.g.} \quad R(0) = 1 \quad \text{or} \quad \mathrm{Res}_{x=0} \, R(x) =1 \, .$$ (I assume we are working with some additive parametrisation of the spectral parameter that I denote by $x$.)
Yang-Baxter integrability guarantees that the model has 'a lot of conserved quantities' and is 'exactly solvable'. (And then we have to make precise what we mean by those terms! But let me not go there now.)
So, "do all 1+1d integrable spin-chain models have an R-matrix satisfying YBE?" Well, yes if you mean 'Yang-Baxter integrable' (by definition), but not necessarily if you mean 'having lots of conserved charges': its a sufficient condition, but I do not think it's necessary, although I do not have a counter example.
Before I can answer the second question I'll need you to clarify what you mean by "having a discrete on-site symmetry". Do you mean that for each site we do not have a vector space (like $\mathbb{C}^2$ for spin-1/2) but just some discrete set, that is a representation of some discrete group? That would not really be a quantum spin chain. There does exists a notion of an $R$-matrix in the discrete setting, but I'm not an expert. The key word is "Yang-Baxter map".
Personally, by '(quantum) integrable' I always mean 'Yang-Baxter integrable'. But even then the details differ from case to case! Since I think this is less well known, and since I personally like this a lot (it's my research) let me illustrate this.
If you have such an $R$-matrix, then you can get a hamiltonian of an integrable model from the logarithmic derivative of the transfer matrix. The initial' = 'regularity' condition guarantees that there is a good value of the spectral parameter at which we can evaluate the logarithmic derivative, and (in the standard setting) that the result is local in the appropriate sense. Here the Yang-Baxter equation and unitarity guarantee that all other coefficients of $x$ in the transfer matrix (or its logarithm, or $\dots$) commute with the Hamiltonian. So we do indeed get a large family of commuting operators. This class of integrable spin chains contains the Heisenberg spin chain. For the Heisenberg XXX and XXZ chains (rational and trigonometric, resp), with six-vertex $R$-matrix, the eigenvectors can be constructed by algebraic Bethe ansatz. But for the Heisenberg XYZ chain (elliptic), with eight-vertex $R$-matrix, constructin eigenvectors is much more subtle since the magnon number is not conserved. There are two common ways around this:
If the length is even then you can use Baxter's face/vertex transformation to construct the ground state. This leads to dynamical R-matrices (of 'face = IRF = SOS type') and Felder's elliptic quantum group, where the Yang-Baxter equation is modified a little. I would still call this case 'Yang-Baxter integrable'.
You can also use Baxter's $Q$-operator to get the Bethe-ansatz equations without Bethe ansatz.
But it's instructive to consider another integrable spin chain: the Haldane-Shastry spin chain. It has long-range pair interactions mediated by an inverse-square potential:
$$ H^\text{hs} = \sum_{i<j}^L \frac{1}{\sin^2[\pi(i-j)/L]} P_{ij} \, . $$
It is a truly remarkable spin chain, with high degeneracies and extremely simple energies (they're all in $\mathbb{Z}$ up to an overall normalisation constant) and (some) explicit eigenvectors. The high degeneracies are because the $SU(2)$ spin symmetry is enlarged to a Yangian symmetry, so there's certainly a rational $R$-matrix underlying this spin chain. But the Hamiltonian is not the logarithmic derivative of the transfer matrix! (For the interested: it's instead related to the so-called quantum determinant, which is why the model has Yangian symmetry. I can provide some references later in case someone is interested.) Moreover, the monodromy matrix is different than for Heisenberg XXX, even if it uses the same $R$-matrix. (The inhomogeneities are very different, not just parameters, but themselves nontrivial operators.) The point is: yes, the Haldane-Shastry spin 'has' an $R$-matrix, BUT it works in a different way than for Heisenberg.
Yet another interesting case is the Inozemtsev spin chain, which interpolates between Heisenberg XXX and Haldane-Shastry. It is exactly solvable (though much more complicated). The concensus is that it should be (quantum) integrable, but the underlying algebraic structure is not yet understood. (My collaborators and I are working on this question.)
Best Answer
This is a big topic and the most suitable answer depends on your background. Since you're a mathematician I assume that you are familiar with representation theory. I'll try to indicate the key rep-th terms corresponding to the terms in the OP, and hope that helps.
For the basic example one starts from the Yangian of $\mathfrak{gl}_2$. It has a presentation (called Drinfeld 3rd realisation = Faddeev--Reshetikhin--Takhtadzhyan presentation = '$RLL$ presentation') by generators and relations. The generators are often combined into an operator $L(u)$ on an ('auxiliary') vector space (which is 2d for the case of $\mathfrak{gl}_2$) with entries that are formal power series in the 'spectral parameter' $u$ whose coefficients lie in the Yangian. This is the $L$-operator, sometimes called monodromy matrix -- though that name often also/instead denotes its image in a representation, see below -- and also often denoted by $T$ instead of $L$. The generators are subject to quadratic relations called the '$RLL$ relations' because of their symbolic form (and called the 'fundamental commutation relations' by Faddeev), where the (rational) $R$-matrix contains the 'structure constants' of the Yangian. For this to define an associative algebra, the $R$-matrix needs to obey the Yang--Baxter equation.
Any finite dimensional representation of $\mathfrak{sl}_2$ gives rise to a representation of this Yangian called an evaluation representation, with an 'inhomogeneity' parameter that can be either viewed as a complex parameter or an indeterminant. In the basic case we do this for the 2d (spin 1/2) irrep and take trivial value of the inhomogeneity parameter. Physically, this is a single site. The image of the $L$-operator here is sometimes called the (local) Lax operator. Bigger representations can be constructed by taking tensor products to obtain the Hilbert space of the spin chain (multiple spins). The image of the $L$-operator in the resulting space is the (global Lax operator or) monodromy matrix. It still acts on the auxiliary space as well. Taking the trace over this auxiliary space gives the transfer matrix, which is the image of an abelian subalgebra of the Yangian (called the Bethe subalgebra) which, when expanded in the spectral parameter, provides a family of commuting operators (conserved charges) on the spin-chain Hilbert space including the translation operator and the Heisenberg XXX Hamiltonian. Finally, in this language, the algebraic Bethe ansatz is a sort highest-weight construction of the eigenvectors of the transfer matrix (and thus the spin chain) that produces actual eigenvectors provided the spectral parameters involved solve the Bethe-ansatz equations.
NB. For the XXZ chain, start from a quantum affine algebra instead, with trigonometric $R$-matrix, and everything carries over (in the generic case). For the XYZ chain, start from an elliptic quantum group, with elliptic $R$-matrix; then you can still construct a transfer matrix and obtain commuting Hamiltonians, but the construction of eigenvectors is much harder since (depending on the type of elliptic quantum group) there are no highest-weight representations (spin-$z$ is not conserved).