The thin-lens formula $$ \frac{1}{f}=\frac{1}{d_o}+\frac{1}{d_i}$$ is an approximation which assumes that the distance from the front interface of the lens (the transition from index of refraction outside the lens to inside) to the back interface (transition from inside to outside) is small enough to be ignored compared to radii of curvature of the interfaces. It also approximates that the important light paths are close to the middle/axis of the lens (paraxial rays).
When two thin lenses are separated by a distance, $d$, which is not small enough to ignore compared to the radii of curvature, one essentially has introduced the effect of a lense which is thick, that is, one in which the front interface is separated from the back by a non-negligible distance, and the thin-lens approximation can't account for the focusing ability of the combination. Essentially, the thick lens is described as having 2 planes of focusing power, separated by a finite, fixed distance. For purposes of finding/using lens power values, distance to the object is measured from one plane, and distance to the image is measured from the other plane.
You can't use a single plane to measure the distance to both image and object. Those planes will not necessarily be equal distances from the physical middle of the lens. It depends on the specific radius of curvature of each surface.
More than two lenses is extremely complicated if you are trying to reduce the system to a single set of parameters that resemble a single non-thin lens. It's better to encounter each lens separately and step through them one at a time.
I haven't actually done the derivation but the approach you would take would be to write a ray tracing matrix for the whole system, including the object distance $s_1$ and the image distance $s_2$:
$$
\begin{bmatrix}x_f \\ \theta_f\end{bmatrix} =
\begin{bmatrix}1 & s_2 \\ 0 & 1\end{bmatrix}
\begin{bmatrix}1 & 0 \\ -1/f_2 & 1\end{bmatrix}
\begin{bmatrix}1 & d \\ 0 & 1\end{bmatrix}
\begin{bmatrix}1 & 0 \\ -1/f_1 & 1\end{bmatrix}
\begin{bmatrix}1 & s_1 \\ 0 & 1\end{bmatrix}
\begin{bmatrix}x_i \\ \theta_i\end{bmatrix}
$$
Then you do the tedious matrix multiplication so that you get coefficients $A,B,C,D$ for your equation system in terms of $d,f_1,f_2,s_1,s_2$:
$$
\begin{bmatrix}x_f \\ \theta_f\end{bmatrix} =
\begin{bmatrix}A & B \\ C & D\end{bmatrix}
\begin{bmatrix}x_i \\ \theta_i\end{bmatrix}
$$
When an image is formed, all the rays starting from position $x_i$ end up at $x_f$ regardless of their initial angle $\theta_i$. So in the equation $x_f = Ax_i + B\theta_i$, you can set $B=0$ and from there derive an expression for $\frac{1}{s_1} + \frac{1}{s_2}$ which is the focal length of the whole system.
This expression, which is hopefully the same as what your book says, will certainly depend on $f_1$, $f_2$, and $d$. If you plot each one while keeping the other two constant, you can see how they depend when e.g. one lens is negative and the other positive, or the distance is greater or smaller than the focal length.
Best Answer
It is the distance from the image plane to the rear principal plane. You can find the location of this plane by projecting the image ray backwards through the system to where it crosses the projection of the object ray. This is sometimes also referred to as the effective focal length (v) of the system, and is true for both simple as well as complicated systems. The distance from the rear lens to the image plane is simply the back focal distance (v"). The difference between the v and v" can be found by the formula:
$\delta$ = $\frac{-d}{n}$$\frac{f}{f_1}$ = v" - v' where n=1 in air