It's a mixture of three distributions, and can be found pretty easily by brute force, if one allows oneself to handwave over some important details (e.g., "is $\lambda < \mu$").
Let $n$ be the number of customers in the system; $n=0$ means no-one is being served or waiting, $n=1$ means one customer is being served but no-one is waiting, etc. Let $p_n$ be the steady-state (assumed from here on) probability that $n$ customers are in the system.
Clearly, if $n=0$, the time $t$ to the next service start is distributed $f(t|n=0) = \text{Exp}\{\lambda\}$, and if $n \geq 2$, the time to the next service start is distributed $f(t|n \geq 2) = \text{Exp}\{\mu\}$, since the next service will start immediately upon completion of the current one. If $n=1$, the time to the next service start is the maximum of the time to the next arrival and the time to the next service completion. This latter distribution has the following easily-derived form:
$f(t|n=1) = \lambda e^{-\lambda t} + \mu e^{-\mu t} - (\lambda + \mu)e^{-(\lambda+\mu)t}$
The mixture probabilities correspond to $p_0$, $1-p_0-p_1$, and $p_1$ respectively. The probabilities $p_0$, $p_1$, and $1-p_0-p_1$ can be written as:
$$p_0 = \left[\sum_{k=0}^{\infty}\left({\lambda \over{\mu}}\right)^k\right]^{-1} = 1 - {\lambda\over{\mu}}$$
$$p_1 = \left({\lambda \over{\mu}}\right)p_0 = {\lambda\over{\mu}} - \left({\lambda\over{\mu}}\right)^2$$
$$1-p_0-p_1 = \left({\lambda\over{\mu}}\right)^2$$
Writing the whole thing out, with some rearranging of terms, gives:
$$f(t) = {\lambda(\mu-\lambda)\over{\mu}}\left(e^{-\lambda t} + e^{-\mu t}\right) + \left({\lambda\over{\mu}}\right)^2\left(\lambda e^{-\lambda t} + \mu e^{-\mu t} - (\lambda+\mu)e^{-(\lambda+\mu)t}\right)$$
My source for probability formulae was Kleinrock, Queuing Systems.
Edit: The derivation of $f(t|n=1)$ is below, written as the derivation of the maximum of two independent exponential variates $x \sim \text{Exp}\{\lambda\}$ and $y \sim \text{Exp}\{\mu\}$. The corresponding CDFs are $F_X(x) = 1-\exp{\{-\lambda x\}}$ and $F_Y(y) = 1-\exp{\{-\mu y\}}$.
We'll approach this using the "cumulative distribution function technique". Note first that the statement "$\max(x,y) \leq t$" is equivalent to "$x \leq t$ and $y \leq t$". The probability that $\max(x,y) \leq t$ is just the product of the probabilities that $x \leq t$ and $y \leq t$ (as $x$ and $y$ are independent.) Writing this out gives:
$$F_{\max(x,y)}(t) = \left(1-e^{-\lambda t}\right)\left(1-e^{-\mu t}\right) = 1 - e^{-\lambda t} - e^{-\mu t} + e^{-(\lambda+\mu) t}$$
and taking the derivative with respect to $t$ gets you to the density function.
The Poisson distribution is for event counts in a given interval of time (in a Poisson process). It is not useful for modelling inter-event times.
If the events follow a Poisson process, the inter-event times will be distributed as exponential.
The usual (ML, MoM) estimator of the exponential mean is the sample mean; the ML estimator of the rate (the reciprocal of the population mean) is the reciprocal of the sample mean.
Best Answer
I'd suggest starting with a quick read of the chapter of Law and Kelton's "Simulation Modeling and Analysis" textbook that discusses methods for selecting distributions to use in Monte Carlo simulations. This chapter discusses methods for selecting candidate distributions, fitting the distributions to your data, and then testing the goodness of fit.
It's quite common to find that many different distributions adequately fit your data. Depending on what you're doing with your model, the choice that you make can have a big effect on the results. In that case, it's appropriate to run your simulation with the different distributions to see how sensitive your results are to the assumed distribution.
For interarrival times, it is nearly always the case in practice that the Poisson process (that is, exponential interarrival times but a Poisson distribution for the number of arrivals in a time period) is the way to go. However, the arrival rate may vary (e.g. by day of the week, time of day, and so on.)