Let $\mathfrak{g}$ be a Lie algebra, and let $G$ be the (unique up to isomorphism) simply-connected group with Lie algebra $\mathfrak{g}$. Here, $\mathfrak{g}$ and $G$ should be thought as "abstract", with the exponential map $\exp : \mathfrak{g} \rightarrow G$ defined in terms of the 1-parameter sub-groups in $G$. The BCH formula is fulfilled for this "abstract" exponential map, although the completly general proof is a bit lengthy. It involves deriving a formula for the derivative of the exponential mapping, then proving $\exp \big( [X,\cdot] \big) Y = \exp (X) Y \exp (-X)$ and finally deducing the BCH formula. A concise presentation can be found eg. in chapter 3 of Brian Hall's "Lie Groups, Lie Algebras, and Representations". This reference aims at being fairly accessible, at the price of restricting itself to those Lie goups $G$ which can be constructed as matrix subgroups: this is the case of most groups used in practice (in particular the Heisenberg group generated by the position/momentum operators of quantum mechanics, together with the identity operator, can be constructed as a subgroup of 3x3 real matrices).
Next, let $X \mapsto i\hat{X}$ be a representation of this Lie algebra by unbounded symmetric operators on a complex Hilbert space $\mathcal{H}$ (note that we do not even need to assume that $\hat{X}$ be essentially self-adjoint, just symmetric).
Since we are dealing with unbounded operators, we have to be a bit careful as to what we mean by a representation, because a priori $\hat{X}\hat{Y}$, $\hat{Y}\hat{X}$, and therefore $[\hat{X}, \hat{Y}]$ may fail to be well-defined. Specifically, we demand that there exists a common invariant dense domain $\mathcal{D} \subseteq \mathcal{H}$, such that, for all $X \in \mathfrak{g}$, $\hat{X}$ is defined on $\mathcal{D}$ and stabilizes $\mathcal{D}$ (ie. $\hat{X} \left\langle \mathcal{D} \right\rangle \subseteq \mathcal{D}$). Then, we can define arbitrary products of the $\hat{X}$'s on $\mathcal{D}$, and in particular commutators.
The question is then whether this representation can be exponentiated, ie:
for all $X \in \mathfrak{g}$, $\hat{X}$ is actually essentially self-adjoint;
and there exists a unitary representation $U \mapsto \hat{U}$ of the group $G$ on $\mathcal{H}$ such that, for any $X \in \mathfrak{g}$, $\widehat{\exp X} = \widehat{\exp} (i\hat{X})$. Here, I denote by $\widehat{\exp}$ the "operator" exponential map, which is defined by spectral resolution of the essentially self-adjoint operator $\hat{X}$: as you observed, in the case of unbounded operators, the exponential cannot be defined in terms of the exponential series (note that the notation $\widehat{\exp}$ is non-standard, I use it here only to prevent confusion between the various notions of exponential maps).
If this holds, the BCH formula satisfied by the "abstract" exponential map $\exp$ will be inherited by the "operator" exponential map $\widehat{\exp}$.
So, with all these preliminaries in place: when can a Lie algebra representation by symmetric unbounded operators be exponentiated? A sufficient condition (useful in practice, albeit not a necessary condition) is the Nelson criterion (lemma 9.1 of Edward Nelson, "Analytic Vectors", Annals of Mathematics, Second Series, Vol. 70, No. 3 (Nov., 1959), pp. 572-615): it demands that there exists a basis $X_1,\dots,X_n$ of $\mathfrak{g}$, a dense subdomain $\mathcal{D}_o \subseteq \mathcal{D}$, and a real $s > 0$, such that:
$$
\forall \psi \in \mathcal{D}_o,\; \sum_{m=0}^{\infty} \frac{s^m}{m!} \sum_{k_1,\dots,k_m} \left\| X_{k_1} \dots X_{k_m} \psi \right\| < \infty.
$$
This is a fairly technical result, but the intuition behind it is that this condition is precisely what you need to define $\widehat{\exp} (i\hat{X})$ over $\mathcal{D}_o$ via the exponential series, and, from there, deduce that $\hat{X}$ is indeed essentially self-adjoint (using the Stone's theorem discussed below), that this definition of $\widehat{\exp} (i\hat{X})$ coincides over $\mathcal{D}_o$ with the spectral one, and finally, that this indeed gives you a unitary representation of $G$ (matching the "abstract" BCH-formula with the one that can be proven directly using the exponential series).
In the case of the position/momentum operators of quantum mechanics, one can for example take $X_1 = \text{id}$, $X_2 = q$, $X_3 = p$ and take $\mathcal{D} = \mathcal{D}_o$ to be spanned by finite linear combination of the harmonic osciallator energy eigenstates. Then, the condition can be proven using the expression of the position/momentum operators in terms of ladder operators.
However, coming back to your specific motivation, I do not honestly think that you need all this machinery. Instead, you can use the explicit expression of the Weyl operators to prove that $t \mapsto W(tz)$ is a strongly continuous one-parameter unitary group: ie. $W(sz) W(tz) = W\big((s+t)z\big)$ and, for any $\psi \in \mathcal{H}$, $t \mapsto W(tz) \psi$ is continuous (with respect to the norm of $\mathcal{H}$; in this case, the $L_2$-norm). Then, Stone's theorem (theorem VIII.8 of Reed and Simon, "Methods of Modern Mathematical Physics", volume 1) tells you that $W(tz) = \widehat{\exp} (it\hat{X})$ with $\hat{X}$ the self-adjoint operator defined by:
$$i \hat{X} \psi = \left. \frac{d}{dt} W(tz) \psi \right|_{t=0}$$
for any $\psi$ in the dense domain $\mathcal{D}$ where this derivative exists. Using again the explicit expression of $W(tz)$ you can then check that $\hat{X} = \sqrt{2} (y\hat{q} - x\hat{p})$.
This answer only tried to give a picture vision of the situation, hoping that things become clear already. After the answer there is a cited reference, Free Lie Algebras, which is the better answer. (Because it is a structural answer, and the structure is beautiful, just switch to the reference and enjoy!)
First, in my pictural opinion a Lie polynomial in the alphabet with letters $A,B,C,D,\dots$ is a homogeneous polynomial in the non-commutative algebra generated by the monoid generated by these letters, we will work over $\Bbb Q$, that can be obtained in the following way.
First fix some letters (with possible repetitions) (from the alphabet) and some order, and put them in a row. For instance;
A B A C A D B A
Now decide to build a "special" tree with these nodes as leaves, going "down", so decide which two neighbor letters should be Lie-condensed first, then use this as a "new letter", and go on recursively. One picture may be:
A B A C A D B A
\ \ / \ \ / / /
\ * \ * / /
\ / \ / / /
* * / /
\ \ / /
\ * /
\ \ /
\ *
\ /
\ /
\ /
\ /
\/
FINAL RESULT
Each *
means to get the joined nodes, and apply [ , ]
on them.
I hope it is clear.
Now observe that
$$
\begin{aligned}
Z
&= F(A,B)
\\
&=\log(e^Ae^B)
\\
&=\log\left(\
\left(1+\frac 1{1!}A+\frac 1{2!}A^2+\dots\right)
\left(1+\frac 1{1!}B+\frac 1{2!}B^2+\dots\right)
\ \right)
\\
&=\log\left(\
1+\sum_{(j,k)\ne (0,0)}
\frac 1{j!k!}A^jB^k
\ \right)
\\
&=
0+\underbrace{(A+B)}_{F_1(A,B)}+\dots
\end{aligned}
$$
has the $F_1$-part equal to $A+B$, a Lie polynomial, and the further homogeneous pieces are under attack.
Back to the question. Why is $F_i\left(A,\sum_j F_j(B,C)\right)$ inductively a Lie polynomial (for $i,j>1$)? Use new letters $D_j$ instead of $F_j(B,C)$ if this makes the things simpler, and let us make the picture of $F_i(A, \sum _j D_j)$. There are many terms that are involving tree collapsing rules as above in a linear combination. Now push each $\sum D_j$ from the sum down on its piece from $F_i$, till it hits the *
, i.e. it is involved in building a Lie bracket. This bracket is linear, so we split the sum $\sum D_j$ into pieces, and work now with an individual $D_j$.
If this $D_j$ is itself (inductively) given by such Lie bracket tree collapsing rules, then we are fine, formally we "move the rule to the top.
We only have problems with $F_1$, which is not really in the range of the Lie bracket tree collapsing rules. I cannot say more.
(I could not figure out which / where is the problem with the "exceptional polynomials", since working only with the homogeneous part of degree $(n+1)$, for instance for $i=1$, $j=n$, and conversely, LHS,
$$
\begin{aligned}
F_1(F_n(A,B),C)
&=F_n(A,B)+C\ ,
\\
F_n(F_1(A,B),C)
&=F_n(A+B,C)\ ,
\end{aligned}
$$
and of course, now we have to start the proof.)
I am now saying some words about the hidden structure, it is a wonderful structure, enjoy it!
In the book version of Free Lie Algebra, Christophe Reutenauer,
referenced fully also in Free Lie Algebras, wiki page the author is quickly introducing a structure of a Hopf algebra on $\Bbb Q\langle\langle A, B,\dots\rangle\rangle$, the free algebra on the monoid generated by the letters $A,B,\dots$, the one multiplication is the usual one, an other one is given by the shuffle product, so for instance $A \sqcup\!\!\!\sqcup B = AB-BA=[A,B]$, so shuffle product monomials are... Lie polynomials. There are two corresponding comultiplications, and using these constructions one can state structural properties. In the list of them, relevant for the present question:
Theorem 1.4 in the book, (not in the linked pdf,) characterizes a polynomial $P$ to be Lie polynomial in the following way, there are equivalent:
- $P$ is a Lie polynomial,
- Define $ad(P)$ by $ad(P)(Q)=[P,Q]=PQ-QP$. One can now take this setting only for generators $A,B,\dots$ of the alphabet, so $ad(A)=[A,-]$, and extend to an algebra mapping, so for instance $Ad(AB)=Ad(A)Ad(B):=ad(A)ad(B)$. The equivalent condition for a $P$ is then $ad(P)=Ad(P)$.
- $P$ is primitive, a structural property in a Hopf algebra.
- $P$ has no free coefficient and the derivative of $P$ conincides with the "right bracketing" of $P$.
Theorem 3.1. is a version of the above for Lie series.
Lemma 1.7 in the book, let $\alpha $ be the antipode, mapping a word $w$ into $\pm$ the reversed word, the sign being captured from the parity of the length. Then for a Lie polynomial $P$ we have $\alpha(P)=-P$.
Theorem 3.2 in the book, let $S=1+\dots$ be a series, higher terms omitted, then there are equivalent:
- $\log(S)$ is a Lie series,
- $S$ is group-like, i.e. $\delta(S)=S\otimes S$,
- the map $w\to (S,w)$ is a homomorphism from the shuffle algebra to $\Bbb Q$,
- $Ad(S)(T)=STS^{-1}$.
Corollary 3.3, the series $S=1+\dots$, such that $\log S$ is Lie series, are building a group under multiplication, this is because of the group-like-property.
Corollary 3.4, $\log(e^Ae^B)$ is a Lie series. Because of the stability w.r.t. the multiplication above, and note that $\log e^A=A$, $\log e^B=B$, are Lie series.
Best Answer
Start with $f(\lambda):=e^{\lambda A}Be^{-\lambda A}$ but take derivatives at $\lambda=0$: $$\begin{align} f(0) &= B \\ f'(0) &= \left( e^{\lambda A}ABe^{-\lambda A} + e^{\lambda A}B(-A)e^{-\lambda A} \right)_{\lambda=0} = \left. e^{\lambda A}[A,B]e^{-\lambda A}\right|_{\lambda=0} = [A,B] \\ f''(0) &= \left( e^{\lambda A}A[A,B]e^{-\lambda A} + e^{\lambda A}[A,B](-A)e^{-\lambda A} \right)_{\lambda=0} = \left. e^{\lambda A}[A,[A,B]]e^{-\lambda A}\right|_{\lambda=0} = [A,[A,B]] \\ \vdots\\ f^{(k)}(0) &= \underbrace{[A,[A,\cdots,[A,B]\cdots]]}_{[A,\cdot]\text{ applied $k$ times}} = [A,\cdot]^k B \end{align}$$ The last identity can be proved by induction.
This gives $$ f(\lambda) = \sum_{k=0}^{\infty} \frac{1}{k!}\lambda^k f^{(k)}(0) = \sum_{k=0}^{\infty} \frac{1}{k!}\lambda^k [A,\cdot]^k B = e^{\lambda[A,\cdot]} B . $$