Firstly, does the notation ${d_k}^{(n)}$ refer to the $n^{th}$ derivative of $d_k$, or the $n^{th}$ power of $d_k$? (I think it refers to the latter; however, the use of notation seems ambiguous to me.)
It refers to none of those. The author just wants to put an $n$ in somewhere to indicate that for each $n$ one has a different series with coefficients $d_k$; since the subscript position was already taken, and unadorned superscript would have meant exponentiation, he went for a parenthesised superscript. Forgetting that that may sometimes means repeated derivative, so it was not really available either. In any case it would be clearer to write $d^{(n)}{}_k$ to indicate that is is coefficient $k$ of $d^{(n)}$. Personally I might have written $(d_n)_k$ to indicate that this is coefficient $k$ of the series $d_n$ that serves as auxiliary to define (the coefficient) $c_n$.
Secondly, how do we know that the sequence $({d_k}^{(n)})_{k\geq0}$ exists?
Because the right hand side is a finite sum of power series in $X$, and we are just taking the sequence of its coefficients.
Finally, why must we have $a_0=0$?
Because otherwise this whole set-up is pointless (though strictly speaking well defined). Note that the right hand side $\sum_{k=0}^n b_kF^k$ (writing $F(X)$ for the series $F$ is only confusing) is just the "infinite sum" of series $\sum_{k\geq0} b_k F^k$ truncated to its initial $n+1$ terms. In algebra infinite sums are suspect (not defined in general) but taking just $n+1$ terms (each of which are power series, and we know how to add two of those) is certainly a valid operation. But you will notice that only the coefficient $n$ of the resulting series is being used (to define $c_n$) the rest of this infinite series is just thrown out of the window, which is kind of a weird thing to do. But look what happens when $a=0$ ($a$ is the constant term of $F$). Then each power $F^k$ is divisible by $X^k$, so it is a power series that starts with zero terms, the first (possibly) nonzero coefficient being that of $X^k$. This means that the change from the series $d_n$ to $d_{n+1}$ will not change anything up to and including the coefficient of$~X^n$ that we used to define $c_n$; that coefficient will also be the coefficient of $X^n$ in the series $d_{n+1}$, and in all series formed after that. What is going on is really developing the infinite sum $\sum_{k\geq0} b_k F^k$, while "harvesting" its coefficients at the point where they become ripe, that is, when they are out of reach of any of the not yet contributed terms.
If one would do this with $a\neq0$, then there would be no point were the coefficients stabilise to a final value, and therefore no reasonable meaning to the infinite sum $\sum_{k\geq0} b_k F^k$. One may of course still pick of the coefficient of $X^n$ after $n+1$ terms have been contributed, but there would be no reason that is meaningful, and it would certainly not give a candidate for the value of $\sum_{k\geq0} b_k F^k$ (such a value is in fact not defined in this case).
The whole story could be given more easily if one would consider infinite sums of power series in a more general setting. No meaning can be given to infinite sums in general, but as long as for each $k\geq0$ the coefficient of $X^k$ in the terms of the sum eventually becomes $0$, one can define the coefficient of $X^k$ in the sum to be that coefficient once all terms that could contribute to it have been included, and that gives a good definition of the infinite sum for this case. Such sums may be called convergent, and (unlike in analysis) there is a very simple condition for convergence of infinite sums: they converge if and only if their terms tend to $0$. Which means exactly that each coefficient eventually become $0$, as indicated above.
Once this is established, one can simply define for any series $G$, and for any $F$ without constant term:
$$
G\circ F = \sum_{k\geq0}b_kF^k
\qquad\text{where $G=\sum_ib_iX^i$.}
$$
To clarify our ideas, let’s see what the Laurent series look like that are convergent on $\{z:|z|=1\}$ — the skin, so to speak, of the closed unit disk. These are the series $\sum_{-\infty<n<\infty}c_nx^n$ for which $\lim_{|n|\to\infty}|c_n|=0$ . That is, we need $|c_n|\to0$ for positive $n$ and negative $n$.
A worthwhile example is $\sum_{n\ge0}p^n(x^{-n^2}+x^{n^2})$. Draw the Newton picture and you see what’s going on: the points you draw are all $(\pm n^2,n)$. This is a series convergent only on the skin.
It’s the same thing for series about which you nothing more than that they are convergent on the skin of $D'$, namely the series $\sum_{-\infty<n<\infty}\frac{c_n}{a^n}x^n$ for which $\lim_{|n|\to\infty}|c_n|=0$; or if you like, the series $\sum_{-\infty<n<\infty}\gamma_nx^n$ for which $\lim_{|n|\to\infty}|a^n\gamma_n|=0$.
But the series we’re concerned with are power series, that is, of form $\sum_0^\infty\gamma_nx^n$; and since the series is convergent on the outer skin of $C$, even at $z=1$, so the coefficients $\gamma_n$ must have $|\gamma_n|\to0$. This is just the condition that our series converges on $D$.
EDIT:
Let’s see whether I can give a satisfactory answer to your very valid objection. It depends on making a careful distinction between a series $G(x)$ and the $p$-adic function $g$ defined by $G$.
While we’re at it, I want to change the coordinatization, sending a point $z$ to $z/a$, so that now our old set $D'$ is the closed unit disk $D$, and the original $D$ becomes what I’ll call $D^+$, namely $\{z:|z|\le|1/a|\}$. Then the original $C$ becomes $\{z:1\le|z|\le|1/a|\}$. Our intersection-set is just the units $U$ of $K$, considered as an analytic space. All this just to make the typing easier for me.
Our four rings of power series now are $S^{[0,1]}\{\sum_0^\infty c_nx^n: c_n\to0\}$ for $D$, $S^{[0,1/|a|]}=\{\sum_0^\infty c_nx^n:c_n/a^n\to0\}$ for $D^+$, $S^{[1,1/|a|]}=\{\sum_{-\infty,\infty}c_nx^n:\lim_{n\to-\infty}c_n=0\text{ and }\lim_{n\to\infty}c_n/a^n=0\}$ for our new annulus $C$, and $S^{\{1\}}=\{\sum_{-\infty<n<\infty}c_nx^n:\lim_{|n|\to\infty}c_n=0\}$ for $U$.
My first task is to show that a nonzero series $G(x)\in S^{\{1\}}$, which you recall may be evaluated at any $z\in U$, to give a numerical value, must define a function which is not identically zero on $U$.
Well, without loss of generality, we may assume that all coefficients of $G$ are in $R$, and indeed some of them are in $R^\times$, the unit group of $R$. But only finitely many of them! Now, by multiplying by a monomial, we may assume that $G(x)$ reduces to the nonzero $\Gamma(x)\in(R/\mathfrak m)[x]$, for $\mathfrak m$ the maximal ideal of $R$, and even, if you like, that $\Gamma$ is monic. But even over an algebraically closed field containing $R/\mathfrak m$, $\Gamma$ has at most finitely many roots. Thus, we may find $\xi$ in either $R/\mathfrak m$ or an algebraic closure for which $\Gamma(\xi)\ne0$, and when we lift $\xi$ to $z_0$ in $R$ or a finite unramified extension, if necessary, we find that $G(z_0)\ne0$, so that the function $g$ defined on $U$ by the series $G$ is not identically zero.
That was the hard part, if any such there was. (I’m sure you can see the rest of the argument.) We now consider an analytic function $f$ on $D$, given by $F(x)\in S^{[0,1]}$ and analytic function $h$ on our annulus $C$, given by $H(x)\in S^{[1,1/|a|]}$, such that $f$ and $h$ agree on $U$. But we have set-theoretic inclusions of $S^{[0,1]}$ and $S^{[1,1/|a|]}$ into $S{\{1\}}$ and since $f$ and $h$ agree on the set $U$, their difference (an element of $S^{[1,1/|a|]}$) is identically zero on $U$, so that $G$ and $H$ are equal, coefficient by coefficient.
I believe that the earlier argument I tried to give now applies, to yield our result.
Best Answer
This is not true in full generality. For a silly example, suppose $|a_i|$ grows extremely fast, and take $Q(X)=X^2-X$. Then $Q(1)=0$ so $P(Q(1))$ converges. However, it is easy to see that if you choose $|a_i|$ to grow fast enough then the norms of the coefficients of $P\circ Q$ will still grow fast such that $(P\circ Q)(x)$ cannot converge for any $x\neq 0$ (you just need to choose $|a_k|$ to be large enough so that $\sum_{i\ge0}a_i\sum_{\substack{j_1+\dotsc+j_i=k\\j_1,\dotsc,j_i\ge1}}b_{j_1}\cdot\dotsc\cdot b_{j_n}$ is dominated by the $i=k$ term).
Here, then, is a correct statement. Suppose $r$ is such that the sequence $|b_j|r^j$ is bounded (in particular, this implies $Q(x)$ converges for all $|x|<r$). Let $R=\sup_j(|b_j|r^j)$ (this is what you would "formally" expect $|Q(x)|$ to be for $|x|=r$ if there was not any fortuitous cancellation that made it smaller). Suppose that the sequence $|a_i|R^i$ is also bounded (in particular, this implies $P(Q(x))$ converges for all $|x|<r$). Then for any $x$ such that $|x|<r$, $(P\circ Q)(x)$ converges to $P(Q(x))$.
(I suspect this can be improved to include the case $|x|=r$ if $|b_j|r^j\to 0$ and $|a_i|R^i\to 0$ so $P(x)$ and $Q(P(x))$ converge for $|x|=r$, but do not quite see how to make the argument work in that case.)
To prove this, as you have observed, it suffices to show that $$c_{ik}=a_i\sum_{\substack{j_1+\dotsc+j_i=k\\j_1,\dotsc,j_i\ge1}}b_{j_1}\cdot\dotsc\cdot b_{j_i}x^k$$ goes to $0$ as $\max(i,k)\to\infty$. Since $|b_j|\leq R/r^j$ for all $j$, we have $|c_{ik}|\leq |a_i|R^i|x|^k/r^k$. Since $|x|<r$, $|x|^k/r^k\to 0$ as $k\to\infty$. Since $|a_i|R^i$ is bounded, this means $c_{ik}\to 0$ as $k\to \infty$ uniformly in $i$. Since $c_{ik}=0$ for $i>k$, this implies $c_{ik}\to 0$ as $\max(i,k)\to\infty$, as desired.