It's like leverage. The longer the distance from the objective lens to the virtual image, the larger the virtual image.
Imagine there's a piece of frosted glass at the focal point. It will show the virtual image.
Now the eyepiece looks at that virtual image with a magnifying glass.
That also makes it look bigger.
You are correct that for a single lens the working distance would be the focal length. For compound lenses, like microscope objectives, you have to look at the entire optical system to figure out the working distance. The short answer to your question is that the focal length and working distance are not what you expect because the objective is a compound lens. The magnification for microscope objectives is confusing because to calculate the magnification you must know something about the tube lens that is used along with the objective.
I'm going to go through an example, for quantities I don't define explicitly here, please see Figures 9 and 12 in refrence 1.
Suppose you have two focusing lenses (f1, and f2) separated by a distance d were the first lens is a distance W from the object you are trying to image. Microscope objectives have to fit in a specific space so they have a required parfocal distance PD = d + W (I am assuming thin lenses).
The ray tracing optics for this lens system would be:
\begin{equation}
M=
\left(
\begin{array}{cc}
M_{11} & M_{12} \\
M_{21} & M_{22} \\
\end{array}
\right)
=
\left(
\begin{array}{cc}
1 & 0 \\
-\frac{1}{\text{f2}} & 1 \\
\end{array}
\right).\left(
\begin{array}{cc}
1 & PD-W \\
0 & 1 \\
\end{array}
\right).\left(
\begin{array}{cc}
1 & 0 \\
-\frac{1}{\text{f1}} & 1 \\
\end{array}
\right).\left(
\begin{array}{cc}
1 & W \\
0 & 1 \\
\end{array}
\right)
\end{equation}
Actual performing the math we get:
\begin{equation}
M=
\left(
\begin{array}{cc}
1-\frac{\text{PD}-W}{\text{f1}} & \text{PD}+\left(1-\frac{\text{PD}-W}{\text{f1}}\right) W-W \\
-\frac{1-\frac{\text{PD}-W}{\text{f2}}}{\text{f1}}-\frac{1}{\text{f2}} & -\frac{\text{PD}-W}{\text{f2}}+\left(-\frac{1-\frac{\text{PD}-W}{\text{f2}}}{\text{f1}}-\frac{1}{\text{f2}}\right) W+1 \\
\end{array}
\right)
\end{equation}
In order for this objective to image properly we set the requirement that M12 = 0. When we are happy that our objective correctly images we can figure out the effective focal length by calculating the M21 element.
Let's work an example. Suppose that PD = 45 mm, f1 = 11 mm, f2 = 16.5 mm. I mostly made these numbers up, but PD for microscope objectives is 45 - 60 mm. This is on purpose to make objectives interchangeable.
From the requirement that M12 = 0 we can calculate W:
\begin{equation}
M_{12} = W \left(1-\frac{\text{PD}-W}{\text{f1}}\right)+\text{PD}-W = 0 \\
\implies W = 19.15 mm,
\end{equation}
so the working distance is 19.15 mm.
Next we want to calculate the focal distance of this lens, which we can get from the M21 element:
\begin{equation}
M_{21} = -(1/f2) - (1 - (PD - W)/f2)/f1 = -0.0091 mm^{-1},
\end{equation}
but focal lengths are usually written like $M_{21} = -1/f_{o}$ so the focal length of our compound lens system is $f_{o}$ = 110 mm. So very clearly, compound lenses do not behave like single lenses.
What about the magnification? To answer that question we have to look at the M matrix with all the numbers plugged in:
\begin{equation}
M=
\left(
\begin{array}{cc}
-1.35037 & 0. \\
-0.00906831 & -0.740536 \\
\end{array}
\right).
\end{equation}
From inspection you'd expect the magnification of this lens to be 1.35 (the minus sign just means the image is upside down). This is where your confusion comes in: magnification for microscope objectives require knowledge of the tube lens. From this example of someone trying to sell you a microscope objective you can see that it clearly states that:
All stated magnifications are based on a tube lens focal length of 200mm.
So for our example above using a 200 mm tube lens the magnification would be $M=f_{tube}/f_{objective} = 200 / 110 = 1.82$. Scrolling down the page of reference 2 you can see that the FL=100 mm lens has a magnification of 2.
Best Answer
The Magnification is a combination of all of the focal lengths of the picture you have shown above. A real image is created by the objective and tube lens. This creates an image of what you have at the object plane that is magnified by:
$M = \frac {f_{tube lens}}{f_{objective}}$
So, if you were to measure the size of the image, it would be M times larger than the object that is placed to the left of the objective lens.
One common confusion is that many microscope objectives actually create an image all by themselves without a tube lens. There are several standards including objectives that create images 160 mm and 170 mm away from the microscope objective. In your diagram, it implies an infinity corrected objective lens. This means that the image created by the microscope objective is infinitely far from the objective lens. This might lead you to believe that since the light is collimated from the microscope objective, you can place the tube lens anywhere you want. That is not technically correct because of two factors: vignetting and the optical design of the tube lens.
Vignetting means that the light escapes the size of the lens. In your diagram, this would happen if the tube lens is too small. Many infinity corrected objectives are designed for tube lenses that are 180 mm from the objective lens. If the tube lens is not placed at a distance close to 180 mm, you can have vignetting or performance from optical aberrations may cause the image to degrade.
Now, take the final step to the eye of the observer. This is the eyepiece. Your eye prefers (is relaxed) when looking at infinity. Therefore, the eyepiece is typically designed to project the image created by the objective-tube lens pair to infinity. Your diagram actually shows the image at 25 cm instead of at infinity. For this case, the eyepiece is placed at nearly one focal length away from real image (image plane 3 in your diagram).
The final magnification is $ M_{total} = M \times \frac{25 cm}{f_{eyepiece}} $
There are additional considerations including:
One last consideration. If you just want to put the image onto a camera. In which case, you don't need the eyepiece!