I'll show you how the book derives that solution.
First, we start with the definition for energy fluctuation:
$$
\langle(\Delta E)^2\rangle =\frac{\partial^2}{\partial \beta^2}\ln(Z)
$$
where $Z$ is the Canonical Partition Function. The heat capacity, $C$, can be defined as:
$$
C=\frac{1}{k_BT^2}\langle (\Delta E)^2\rangle
$$
Next, let's calculate $Z$ for your question:
\begin{align}
Z&=\sum_i g_i e^{-\beta \epsilon_i}\\
&=1\cdot e^{-\beta \epsilon}+2\cdot e^{-2\beta \epsilon}+1\cdot e^{-3\beta \epsilon}
\end{align}
Where the degeneracies are the factors in front of each term. The little "trick" now is to factorise this, because later we plan on taking the logarithm and so we want to write this expression as a product of factors.
\begin{align}
Z&=e^{-\beta \epsilon}(1+2\cdot e^{-\beta \epsilon}+e^{-2\beta \epsilon})\\
&=e^{-\beta \epsilon}(e^{-\beta \epsilon}+1)(e^{-\beta \epsilon}+1)\\
&=e^{-\beta \epsilon}(e^{-\beta \epsilon}+1)^2
\end{align}
Okay, now that we have the canonical partition function written as a product (rather than sum) we can take the natural logarithm.
\begin{align}
\ln(Z)&=\ln\left(e^{-\beta \epsilon}(e^{-\beta \epsilon}+1)^2\right)\\
&=\ln\left(e^{-\beta \epsilon}\right)+\ln\left((e^{-\beta \epsilon}+1)^2\right)\\
&=-\beta\epsilon+2\ln\left(e^{-\beta \epsilon}+1\right)\\
\end{align}
Next step is to calculate the energy fluctuation by taking the second derivative of this function:
\begin{align}
\langle(\Delta E)^2\rangle&=\frac{\partial^2}{\partial \beta^2}\ln(Z)\\
&=\frac{\partial^2}{\partial \beta^2}\left(-\beta\epsilon+2\ln\left(e^{-\beta \epsilon}+1\right)\right)\\
&=\frac{\partial}{\partial \beta}\left(-\epsilon +\frac{2(-\epsilon e^{-\beta\epsilon})}{e^{-\beta\epsilon}+1}\right)\\
&=\frac{\partial}{\partial \beta}\left(-\epsilon +\frac{-2\epsilon}{1+e^{\beta\epsilon}}\right)\\
&=-(-1)2\epsilon(\epsilon e^{\beta\epsilon})\left(1+e^{\beta\epsilon}\right)^{-2}\\
&=\frac{2\epsilon^2e^{\beta\epsilon}}{\left(1+e^{\beta\epsilon}\right)^2}
\end{align}
The final step is to calculate the heat capacity using the definition above:
\begin{align}
C&=\frac{1}{k_BT^2} \langle(\Delta E)^2\rangle\\
&=\frac{1}{k_BT^2}\frac{2\epsilon^2e^{\beta\epsilon}}{\left(1+e^{\beta\epsilon}\right)^2}\\
&=\beta^2k_B\frac{2\epsilon^2e^{\beta\epsilon}}{\left(1+e^{\beta\epsilon}\right)^2}\\
&=k_B\frac{2(\beta\epsilon)^2e^{\beta\epsilon}}{\left(1+e^{\beta\epsilon}\right)^2}
\end{align}
Hence, with $x=\beta\epsilon$ this can be written as:
\begin{align}
C=k_B\frac{2x^2 e^{x}}{\left(1+e^{x}\right)^2}
\end{align}
If any steps don't make sense let me know and I'll explain them!
As already stated in the comments your equations (2) and (4) hold. But the conclusion (5) is wrong, because it assumes that $\partial_E$ is interchangeable with $\partial_{E_1}$ and $\partial_{E_2}$
The entropy of the total system, is of course the sum of entropy of the individual systems. However, by specifying the energy of the total system, $E_1$ and $E_2$ are not given a priori (a priori all energies $E_1$ and $E_2$ such that $E_1 + E_2 = E$ are permissible).
Now for the fluctuation-dissipation theorem to hold for the combined system, the resulting combined system must be in equilibrium. This is only the case if the temperature of the two systems was the same before bringing them into weak energetic contact (also note, that otherwise it does not make sense to talk of the heat capacity of the resulting system, as it would not have a defined temperature).
This constraint means, that the following relation must hold: $\frac{\partial S_1}{\partial E_1} = \frac{\partial S_2}{\partial E_2}$ (which is just equating the inverse temperatures of the individual systems).
Now the total entropy can be written as
$$ S(E) = S_1\big(E_1(E)\big) + S_2\big(E_2(E)\big) = S_1\big(E_1(E)\big) + S_2\big(E-E_1(E)\big) $$
Where $E_1$ and $E_2$ are functions of $E$ (that can be determined from the equilibrium condition and the constraint $E = E_1 + E_2$).
That means that
$$ \partial_E S(E) = (\partial_E E_1) \partial_{E_1} S_1 + (\partial_E E_2) \partial_{E_2} S_2. $$
This first derivative has a lucid structure, due to the constraint $E = E_1 + E_2$ we know that
$$ \partial_E E_1 + \partial_E E_2 = 1 $$
So we get that for the two systems in equilibrium at temperature $T$ the temperature of the total system is also $T$.
The second derivative then gives
$$ - \frac{1}{C_V T^2} = \partial_E^2 S(E) = (\partial_E^2 E_1) \partial_{E_1} S_1 + (\partial_E E_1)(\partial_E E_1) \partial_{E_1} S_1 + (\partial_E^2 E_2) \partial_{E_2} S_2 + (\partial_E E_2)(\partial_E E_2) \partial_{E_2} S_2$$
Which is clearly not the same as $\partial_{E_1}^2 S_1 + \partial_{E_2}^2 S_2 $. I am not sure whether you can somehow get the additivity result from this (you probably can - but I don't see a clear way ahead).
Another view that you can take is that your systems are prepared in a microcanonical ensemble. Then combining them means, that the energy $E$ of the combined ensemble can be distributed in an arbitrary fashion between the parts, such that $E_1 + E_2 = E$.
The entropy of the single systems is the logarithm of the degeneracy $\nu(E_i)$ of the energy eigenstate at $E_i$. Here you see even more clearly, why $\partial_{E} S(E_1)$ is not easily expressed.
Best Answer
Hint: Analyze each subsystem separately. Occupation probabilities will follow Boltzman distribution: $P(1) = exp(-E_1/kT)/(exp(-E_1/kt)+exp(-E_2/kt))$ etc.
$<E(T)> = P_1 * E_1 + P_2*E_2$
Expected energy for whole system is just $N$ times that per subsystem. Differentiate to get heat capacity.