1) I believe it is possible to obtain $P(M_n<a\ | \tau>n)$, where $\tau:=\inf\{n\ge 1: S_n\le 0\}$ in more or less explicit way only for the case of the simple random walk. These computations can be found in Billingsley, Convergence of Probability measures, Chapter 2.11. For large $n$ these computations become universal as you can apply FCLT, see below.
2) Generally, one can use Brownian motion approximations as was suggested by fedja. Mathematic justification is given in the references below. Before going into this I will say a few words about conditioning of random walks to stay positive.
There are two ways to condition random walk to be positive. First way is to condition on $\{\tau>n\}$, where $\tau:=\inf\{n\ge 1: S_n\le 0\}$. Second way, is to ''condition'' on $\{\tau=\infty\}$. Here you cannot use direct conditioning as $\mathbf P(\tau=\infty)=0$ (for the zero-mean finite variance random walk). So, the way to proceed is to use Doob's $h$ transform. As a result one obtains a Markov chain (Lamperti Markov chain, in fact) which is a discrete-time analogue of Bessel process.
Now I can move on to approximations as $n\to\infty$.
a) When $a$ is of order $\sqrt n$, you need to use corresponding Functional Central Limit Theorem (FCLT). You need functional central limit theorem as the maximum depends on the whole trajectory of the random walk.
The way to use functional limit theorems is described in the above Billinsley's book, you can see an example for unconditioned random walk in Chapter 2.11.
For the first type of conditioning FCLT is proved in Iglehart, 1974, Functional Central Limit Theorems for Random Walks Conditioned to Stay Positive (doi:10.1214/aop/1176996607), see also Bolthausen, 1976, On a Functional Central Limit Theorem for Random Walks Conditioned to Stay Positive. Conditioning of the second type was considered in Bertoin and Doney (1994), On Conditioning a Random Walk to Stay Nonnegative (doi:10.1214/aop/1176988497). FCLT for the second type of conditioning can be found in Caravenna, Chaumont (2008), Invariance principles for random walks conditioned to stay positive (doi:10.1214/07-AIHP119). Finally, in higher dimensions you can use FCLT from http://arxiv.org/abs/1110.1254.
b) When $a$ is much larger then $\sqrt n,$ then $\mathbf P(M_n<a\ |\ \tau>n)\to 1$.
c) When $a<<\sqrt n$ you are dealing with small deviations probability.
If you write $\mathbf P(M_n<a | \tau>n)=\mathbf P(M_n<a , \tau>n)/\mathbf P(\tau>n)$, then approximation for $\mathbf P(\tau>n)\sim Cn^{-1/2}, n\to\infty$, and
$$
\mathbf P(\tau>n, M_n<a)=\mathbf P(\mbox{random walk $S_k$ stays in (0,n) for all } 0<k\le n)
$$
Now, if $a\sim An^{1/2-\delta}$ for $\delta\in(0,1/2)$ you can split this trajectory in $n/a^2$ parts of length $a^2$ and use independence to obtain estimate
$$
\mathbf P(\tau>n, M_n<a)\le \left(\mathbf P(\mbox{random walk $S_k$ stays in (0,2a) for all } 0<k\le a^2)\right)^{n/a^2}
$$
Now, the probability inside the parenthesis is on the right scale and you can apply the FCLT to get
$$
\mathbf P(\mbox{random walk $S_k$ stays in (0,2a) for all } 0<k\le a^2)\to C(A)\in(0,1).
$$
This results in the following upper bound
$$
\mathbf P(\tau>n, M_n<a) \le C(A)^{n/a^2}\approx C(A)^{n^{2\delta}}.
$$
As $C(A)$ is in $(0,1)$ this probability is quite small. .
For any $\beta>0$, $$\mathbb{E}B(n,p)^k\leq k!\beta^{-k}\mathbb{E}e^{\beta B(n,p)}= k!\beta^{-k}(1-p+pe^{\beta})^n.$$
Now you can plug various $\beta$, e.g. $\beta=\frac{k}{np}$ which yields
$$\mathbb{E}B(n,p)^k\leq (np)^k k!k^{-k}\left((1-p)+pe^{\frac{k}{pn}}\right)^n.$$
I assume that you got your estimate by elaborating on this expression, although in the case $1\ll k\ll n$ the above is slightly better.
It feels unlikely that you can get much better universal bounds. Indeed, you can write
$$
\mathbb{E}B(n,p)^k=k!\int\frac{(1-p+pe^z)^n}{z^{k+1}}dz,
$$
integral being over a contour around zero. When you transform it to pass through a saddle point, you are essentially solving the optimization problem on $\beta$ as above, hence the maximum of the integrand is of the same order. In the limit, Laplace's method gives you at best some polynomial corrections. Another room for small improvements would be a more careful choice of $\beta$; e. g. for $1\ll k\ll n$ you can find a second-order approximation to the extremum.
Best Answer
One can use Birnbaum and Marshall inequality:
Theorem(Theorem 2.1. in 1). If $\left(S_k,k\geqslant 1\right)$ is a non-negative sub-martingale and $(c_k,k\geqslant 1)$ a non-decreasing sequence of positive numbers, then for each $p\geqslant 1$: $$\mathbb P\left\{\max_{1\leqslant k\leqslant n}\frac{S_k}{c_k}\geqslant 1\right\}\leqslant \frac{\mathbb E\left[S_n^p\right]}{c_n^p}+\sum_{i=1}^{n-1}\left(\frac 1{c_i^p}-\frac 1{c_{i+1}^p}\right)\mathbb E\left[S_i^p\right].$$
Reference:
1 [Some Multivariate Chebyshev Inequalities with Extensions to Continuous Parameter Processes]1 Z. W. Birnbaum and Albert W. Marshall, Ann. Math. Statist. Volume 32, Number 3 (1961), 687-703.