There's no escaping Lie theory if you want to understand what is going on mathematically. I'll try to provide some intuitive pictures for what is going on in the footnotes, though I'm not sure if it will be what you are looking for.
On any (finite-dimensional, for simplicity) vector space, the group of unitary operators is the Lie group $\mathrm{U}(N)$, which is connected. Lie groups are manifolds, i.e. things that locally look like $\mathbb{R}^N$, and as such possess tangent spaces at every point spanned by the derivatives of their coordinates — or, equivalently, by all possible directions of paths at that point. These directions form, at $g \in \mathrm{U}(N)$, the $N$-dimensional vector space $T_g \mathrm{U}(N)$.1
Canonically, we take the tangent space at the identity $\mathbf{1} \in \mathrm{U}(N)$ and call it the Lie algebra $\mathfrak{g} \cong T_\mathbf{1}\mathrm{U}(N)$. Now, from tangent spaces, there is something called the exponential map to the manifold itself. It is a fact that, for compact groups, such as the unitary group, said map is surjective onto the part containing the identity.2 It is a further fact that the unitary group is connected, meaning that it has no parts not connected to the identity, so the exponential map $\mathfrak{u}(N) \to \mathrm{U}(N)$ is surjective, and hence every unitary operator is the exponential of some Lie algebra element.3 (The exponential map is always surjective locally, so we are in principle able to find exponential forms for other operators, too)
So, the above (and the notes) answers to your first three questions: We can always represent a unitary operator like that since $\mathrm{U}(N)$ is compact and connected, the exponential of an operator means "walking in the direction specified by that operator", and while $\mathcal{U}$ lies in the Lie group, $\mathcal{T}$ lies, as its generator, in the Lie algebra. One also says that $\mathcal{T}$ is the infinitesimal generator of $\mathcal{U}$, since, in $\mathrm{e}^{\alpha \mathcal{T}}$, we can see it as giving only the direction of the operation, while $\alpha$ tells us how far from the identity the generated exponetial will lie.
The physical meaning is a difficult thing to tell generally - often, it will be that the $\mathcal{T}$ is a generator of a symmetry, and the unitary operator $\mathcal{U}$ is the finite version of that symmetry, for example, the Hamiltonian $H$ generates the time translation $U$, the angular momenta $L_i$ generate the rotations $\mathrm{SO}(3)$, and so on, and so forth — the generator is always the infinitesimal version of the exponentiated operator in the sense that
$$ \mathrm{e}^{\epsilon T} = 1 + \epsilon T + \mathcal{O}(\epsilon^2)$$
so the generated operator will, for small $\epsilon$ be displaced from the identity by almost exactly $\epsilon T$.
1 Think of the circle (which is $\mathrm{U}(1)$): At every point on the circle, you can draw the tangent to it - which is $\mathbb{R}$, a 1D vector space. The length of the tangent vector specifies "how fast" the path in that direction will be traversed.
2 Think of the two-dimensional sphere (which is, sadly, not a Lie group, but illustrative for the exponential map). Take the tangent space at one point and imagine you are actually holding a sheet of paper next to a sphere. Now "crumble" the paper around the sphere. You will end up covering the whole sphere, and if the paper is large enough (it would have to be infinte to represent the tangent space), you can even wind it around the sphere multiple times, thus showing that the exponential map cannot be injective, but is easily seen to be surjective. A more precise notion of this crumbling would be to fix some measure of length on the sphere and map every vector in the algebra to a point on the sphere by walking into the direction indicated by the vector exactly as far as its length tells you.
3 This is quite easy to understand - if there were some part of the group wholly disconnected to our group, or if our group had infinite volume (if it was non-compact), we could not hope to cover it wholly with only one sheet of paper, no matter how large.
Best Answer
Even if there is already a good accepted answer I would like to say something further to completely fix some details.
No, it does not work essentially because of the used wrong notion of convergence.
However, it is possible to prove that, if $A$ — with dense domain $D(A)$ — is closed and normal (*) — which includes the selfadjoint and the unitary case — then there is a dense subspace $D_A\subset D(A)$ of vectors, called analytic vectors where the formula is still valid with the crucial changes that
(a) the operators have to be applied to these vectors, and
(b) the topology of the Hilbert space has to be used (the series is now of vectors rather than operators), $$e^{tA}\psi = \sum_{n=0}^{+\infty} \frac{t^n}{n!}A^n \psi\:, \quad \forall \psi \in D_A .$$
(The parameter $t\in \mathbb{C}$ can be taken in a sufficiently small neighborhood of $0$, independent of $\psi\in D_A$.)
I stress that the series is not the definition of the exponential, the identity above is an identity of two independently defined mathematical objects.
However that series can be used to equivalently define the exponential on the said domain and this definition coincides with the definition for unbounded operators below.
If $A: D(A) \to H$, densely defined, is closed and normal, then it admits a spectral measure $P: B(\mathbb{C}) \to B(H)$, where $B(\mathbb{C})\ni E$ is the Borel $\sigma$-algebra on $\mathbb{C}$ and each $P(E)$ is an orthogonal projector in $H$.
We can eventually define (on a suitably dense domain defined below) $$f(A) := \int_{\mathbb{C}} f(\lambda) dP(\lambda)\tag{1}$$ for every Borel-measurable function $f: \mathbb{C}\to \mathbb{C}$.
The exponential of the said $A$ is defined in this way simply replacing $f$ for the exponential map.
If $A$ is selfadjoint, $B(\mathbb{C})$ can be repalced by $B(\mathbb{R})$ since outside $\mathbb{C}$ the spectral measure vanishes.
Actually the support of the spectral measure of $A$ (densely-defined, closed, and normal) always coincides with the spectrum $\sigma(A)$ of $A$.
We have two cases which actually coincide where both definitions are suitable.
(a) If $A$ is everywhere defined and bounded, the exponential is automatically well-defined by its series expansion — with respect to the operator norm — and this expansion can be used as the very definition.
(b) If $A$ is not everywhere defined / bounded, the previous definition (1) based on the Borel functional calculus applies when $A$ is densely defined, normal, and closed, in particular selfadjoint.
This latter definition, (b), coincides with the former, (a), when $A$ is everywhere defined, bounded and normal, for instance if $A$ is unitary.
As declared at the beginning, the series expansion is however valid for densely-defined, closed, normal operators working on analytic vectors and using the norm of $H$ (technically the strong operator topology).
As far as I know these (densely-defined, closed, normal) are the minimal requirements producing a consistent theory for unbounded operators.
The domain of $f(A)$ as in (1) is
$$D(f(A)) = \left\{\psi \in H \:\left|\: \int_{\mathbb{c}} |f(x)|^2 d \mu^A_\psi(x)< +\infty \right.\right\}\tag{2}$$ where $$\mu^A_\psi(E) := \langle \psi |P(E) \psi\rangle$$ is a standard positive finite Borel measure.
If $A$ is selfadjoint $\mu^A_\psi$ is supported in $\mathbb{R}$ actually on $\sigma(A)$. There, $f(x) = \exp x$ is not bounded (unless $\sigma(A)$ is bounded which menas that $A$ is bounded), so that $D(f(A)) \subsetneq H$.
However if you instead consider $f(x)= \exp ix$ and $A$ is selfadjoint, then $f$ is boundend by $1$ on $\mathbb{R}$. Since $\mu^A_f(\mathbb{R}) = ||\psi||^2 < +\infty$, it turns out from (2) that $$D(f(A)) = H\:.$$
If $E \subset \mathbb C$ is Borel set and $\chi_E(x)=1$ for $x\in E$ and $\chi_E(x)=0$ otherwise, then $$P_E := \int_{\mathbb C} \chi_E(x) dP^{(A)}(x)$$ is an orthogonal projector onto a closed subspace $H_E$.
A family of analytic vectors $\psi$ thus satisfying $$e^{tA}\psi = \sum_{n=0}^{+\infty} \frac{t^n}{n!}A^n \psi$$ whose (finite) span is dense is obtained as follows. Take a class of Borel sets $E_N\subset \mathbb C$, where $N\in \mathbb N$, requiring that every $E_N$ is bounded and $\cup_N E_N = \mathbb C$. The said family of analytic vectors consists of all vectors $\psi \in H_{E_N}$ for every $N \in \mathbb N$.
As I final remark, I stress that almost all operators with some relevance in QM are both densely defined and closed.
(*) $A: D(A) \to H$ is closed if the set of pairs $(\psi, A\psi)$ with $\psi \in D(A)$ is a closed set in $H \times H$.
$A: D(A) \to H$ densely defined and closed is normal if $A^\dagger A= A A^\dagger$ on the natural domains of both sides which are required to coincide.