First, general relativity is not a gauge theory in the narrow sense (of having a gauge field) if you consider the second-order formalism in which only the metric is dynamical. The Einstein-Hilbert action conceived of as an action where the only dynamical field is $g$ still has spacetime dependent symmetries ($\mathrm{GL}(n)$-valued transformations acting like the Jacobians of diffeomorphisms on all fields), so it has gauge symmetries and consequently gauge freedom (e.g. the one used below in the spin connection formalism to "diagonalize the metric" at every point), but it does not have a dynamical gauge field. However, there are (at least) two ways to formulate the theory of the Einstein-Hilbert action in terms of a gauge field:
General relativity is a gauge theory with either the general linear group $\mathrm{GL}(n)$ or the Lorentz group $\mathrm{SO}(n-1,1)$ playing the role of the gauge group, depending on your formulation, if you're willing to relax the usually strict requirement that only gauge-invariant quantities are physically meaningful - while Lorentz invariant quantities are more useful in generic computations than, say, vectors, no one claims you can't measure a vector in a given frame. Additionally, GR is not a "free" gauge theory (in the sense of Yang-Mills or Chern-Simons) coupled to something, the gauge field is never the sole dynamical variable, but always coupled to either the metric or the vielbein, so there's another sense in which it doesn't conform to our usual notion of gauge theory.
The two formulations are as follows:
Classical (Palatini) formalism: In the first-order formulation (Palatini formalism, so also this question) of GR, the dynamical fields are the metric and the Christoffel symbols. Examining the transformation behaviour of the Christoffels (as I do in this answer), it is straightforward to see that they transform precisely like a $\mathrm{GL}(n)$-gauge field. It is rather crucial to note that diffeomorphism invariance is not the same as gauged $\mathrm{GL}(n)$-invariance - the former is a basic aspect of all "coordinate-invariant physics", while the latter essentially arises because the Ricci scalar in the Einstein-Hilbert action is analogous to the gauge-invariant $\mathrm{Tr}(F)$ terms in ordinary gauge theories. Yes, this is often claimed otherwise, and yes, I am sure that diffeomorphisms are not gauged versions of anything. However, diffeomorphisms induce $\mathrm{GL}(n)$ gauge transformations through their Jacobians, see again the answer about the transformation behaviour of the Christoffels I linked above.
Spin connection formalism: Instead of conceiving of the tangent bundle as associated to a $\mathrm{GL}(n)$-frame bundle, a manifold of signature $p,q$ has naturally a reduction of the frame bundle to a $\mathrm{SO}(p,q)$ frame bundle, which you may think of as just the bundle of all orthonormal bases relative to the given metric of signature $p,q$, whereas the $\mathrm{GL}(n)$ bundle is the bundle of all bases. The physicist knows this reduction as the tetrad or vielbein formalism, and it allows us to reduce the $\mathfrak{gl}(n)$-valued gauge field $\Gamma$ that is the Christoffels to a $\mathfrak{so}(p,q)$-valued gauge field that is the spin connection $\omega$ essentially by a smooth choice of orthonormal (non-coordinate) basis all over spacetime, which I explain in a bit more detail in this answer. The dynamical fields in the spin connection formalism are the spin connection and the vielbein.
As supplementary evidence that the slogan that "diffeomorphism invariance is a gauge invariance" is false, I urge you to consider that ordinary Yang-Mills theory is also perfectly "diffeomorphism invariant": The Yang-Mills action
$$ \int_M \mathrm{tr}(F\wedge{\star}F)$$
has no dependence on coordinates whatsoever either, it is not more or less "diffeomorphism invariant" than the Einstein-Hilbert action is. The significance of "diffeomorphism invariance" in GR is really much more that, as I said above, the Jacobians of diffeomorphisms are the natural source for the gauge transformations of the Christoffels, and that the theory would also be separately invariant just under the $\mathrm{GL}(n)$ transformations without considering an underlying diffeomorphism.
I cannot give you a full answer, that honor has to pass on to someone else, but nontheless I can give you some info you might value.
It is also known (but there's lot of confusion out there about this) that GR is also invariant under active coordinates transformations (also known as diffeomorphisms), which could be interpreted as a kind of gauge transformations, and not just as simple changes of local coordinates.
I honestly think this is wrong. The point is, if you choose the local coordinate formalism, active diffeos cannot be told apart from passive diffeos. These are all maps of the form $$ y^\mu=\Phi^\mu(x^1,...,x^n). $$ If your theory is invariant under one, it is invariant under both.
Background independence is a statement that is independent of diffeomorphism invariance. Or at least it is independent to the degree that diffeomorphism-invariance is necessary for background independence, but diffeomorhpism invariance, active or passive, does not imply background independence.
Background independence, at least as long as we consider only local behaviour and not global topological aspects is a consequence of the fact that the metric is dynamical and subject to the EFE $G_{\mu\nu}=8\pi G T_{\mu\nu}$.
Consider and arbitrary spacetime $(M,g)$ where $g$ is a flat metric in the sense that $\text{Riem}[g]=0$. Any theory you build on this spacetime will not be background-independent, because the flatness condition at least locally restricts spacetime geometry to be Minkowskian. However if $S=0$ is a tensor equation, and $\phi:M\rightarrow M$ is a diffeomorphism, then $$ \phi^\ast S=0 $$ is also satisfied, hence, it is diffeomorphism invariant.
You can read a very good discussion of the relationship between general covariance, general invariance (this terminology is nonstandard I think) and background independence in Straumann's General Relativity.
because full Local Poincaré invariance is supposed to bring torsion into GR (I never saw any convincing proof of this)
A convincing proof, which I will only sketch/explain here but not actually do can be found in Kobayashi & Nomizu (Foundation of Differential Geometry vol 1).
The point is that local transformations act on the individual tangent spaces $T_pM$ for all $p\in M$. In a tangent space, you can easily interpret a homogenous/linear transformation, but what does shifts of the form $v^\mu +a^\mu$ mean? After all $T_pM$ is usually not interpreted as a space of points.
However one can construct a fiber bundle called an affine bundle, which is essentially a fiber bundle whose local trivializations are of the form $U\times A$ where $U\in \tau_M$ is an open subset of $M$ and $A$ is an affine space. A connection on an affine bundle as called an affine connection. There is a subtle relationship between affine connections and linear connections (connections on vector bundles) and in the vast majority of the cases, an affine connection induces a linear connection in an associated vector bundle (which can basically be obtained from an affine bundle by choosing a zero section), so affine connections are somewhat more general, but for pretty much all relevant cases, they are essentially the same (hence why usually less precise sources use the two terms synonymously).
One may also construct a bundle of affine frames as a principal bundle associated to an affine bundle. If the most general vector bundle of rank $k$ admits an associated principal bundle whose structure group if $\text{GL}(k,\mathbb R)$ then the most general associated principal bundle to an affine bundle has $\text{GL}(k,\mathbb R) \rtimes \mathbb R^k$ as its structure group. The Lie algebra of this group is then isomorphic to $\mathfrak{gl}(k,\mathbb R)\oplus \mathbb R^k$, so any connection on this principal bundle can be written as a pair $$ (\theta^a,\omega^a_{\ b}), $$ where the first is an $\mathbb R^k$-valued 1-form and the second is a $\mathfrak{gl}(k,\mathbb R)$-valued 1-form. The latter can be identified as the usual connection form of a $\text{GL}(k,\mathbb R)$-connection, while the former can be identified with the tautological/soldering 1-form on the usual frame bundle. It is known that the torsion of a usual linear connection is given by $$ T=d_\omega \theta, $$ and the curvature is given by $$ \Omega=d_\omega\omega. $$ However from the point of view of the affine bundle, both $\theta$ and $\omega$ are connection forms, so one can essentially see that the torsion is the "translational" part of the curvature and $\Omega$ is the "linear" part of the curvature (rotational in case of metric compatibility).
However note that once again, local translation invariance (represented by the affine bundle) is not sufficient for nonvanishing torsion, it just allows it. And since as I have said, the majority of the time there are strong isomorphisms between linear and affine connections, this whole formalism is not necessary, it just gives an interpretation of torsion in terms of the translational part of a connection. You can, however simply incorporate torsion into your theory via more pedestrian means fine.
Best Answer
There is a risk of confusion here because in some sense Minkowski space is too nice. What I mean by this is that in the setting of Minkowski space, because of its simple structure, there are identifications and globalizations possible that do not make sense in a general spacetime.
In a general spacetime at every point you can define the tangent vector space. It roughly has one direction for every direction you can move in. This does not mean that the space is an affine space (an affine space is like a vector space without a preferred origin), it could be a sphere for example. But for Minkowski space, the spacetime is indeed an affine space and this leads you confusion of the whole space with the tangent space.
Let us talk about relativity. An observer in spacetime can find three spacelike curves and one timelike curve through his or her time and position. The principle of Lorentz invariance is that any choice is fine! For the space part this is just that you can rotate your laboratory and get the same results. That you are allowed to mix time and space comes from that the speed of light should be the same for observers in relative motion.
So really, what local Lorentz invariance means is that you can rotate your laboratory without changed results, and observers moving relative to it see the same physics. This is an expression of symmetry in the tangent space.
Now in Minkowski spacetime pick an arbitrary origin. Then Minkowski spacetime has the same structure as the (1+3) tangent spaces of general relativity, so the local Lorentz invariance can be made global. Since the origin was arbitrary you have also four translation symmetries. This is the Poincare symmetry.
Local Lorentz invariance is a statement about how your local choice of time and space axis is unimportant. Global Lorentz and Poincare invariance is a much stronger statement about the symmetries of spacetime itself. In particular, a spacetime need not have any symmetries at all (and there are many known examples of solutions to Einstein's equations that don't).