Without loss of generality, assume that $\lambda>1$. Then $\mathbb{H}/\Gamma$ is biholomorphic to the annulus
$$A(\lambda):=\{z\in\mathbb{C}\mid e^{-\frac{2\pi^2}{\log\lambda}}<|z|<1\}.$$
To see this, let us fix a branch of the logarithm function on $\mathbb{H}$ as follows:
$$\log(re^{i\theta})=\log r+i\theta,\quad~\forall r>0,~ \theta\in(0,\pi)~.$$
Define
$$f:\mathbb{H}\to \mathbb{C},\quad z\mapsto e^{\frac{2\pi i\log z}{\log \lambda}}.$$
By definition, $f$ is holomorphic. It is easy to verify that $f(\mathbb{H})=A(\lambda)$ and $f:\mathbb{H}\to A(\lambda)$ is a covering map with deck transformation group $\Gamma$. Since $\mathbb{H}$ is simply connected, the conclusion follows.
Remark: On the one hand, the generator $d_\lambda\in Aut(\mathbb{H})$ of $\Gamma$ is hyperbolic. When considered as an element in $PSL(2,\mathbb{R})=SL(2,\mathbb{R})/\{\pm I\}$, the conjugate class of $d_\lambda$ is uniquely determined by $(\mathrm{tr}~d_\lambda)^2$. On the other hand, up to biholomorphic equivalence, a hyperbolic Riemann surface with fundamental group $\mathbb{Z}$ is uniquely determined by its modulus. For $\mathbb{H}/\langle d_\lambda\rangle\cong A(\lambda)$, there is a $1$-$1$ correspondence between $(\mathrm{tr}~d_\lambda)^2$ and the modulus of $A(\lambda)$.
The definition of a modular form seems extremely unmotivated, and as @AndreaMori has pointed out, whilst the complex analytic approach gives us the quickest route to a definition, it also clouds some of what is really going on.
A good place to start is with the theory of elliptic curves, which have long been objects of geometric and arithmetic interest. One definition of an elliptic curve (over $\mathbb C$) is a quotient of $\mathbb C$ by a lattice $\Lambda = \mathbb Z\tau_1\oplus\mathbb Z\tau_2$, where $\tau_1,\tau_2\in\mathbb C$ are linearly independent over $\mathbb R$ ($\mathbb C$ and $\Lambda$ are viewed as additive groups): i.e.
$$E\cong \mathbb C/\Lambda.$$
In this viewpoint, one can study elliptic curves by studying lattices $\Lambda\subset\mathbb C$. Modular forms will correspond to certain functions of lattices, and by extension, to certain functions of elliptic curves.
Why the upper half plane?
For simplicity, since $\mathbb Z\tau_1 = \mathbb Z(-\tau_1)$, there's no harm in assuming that $\frac{\tau_1}{\tau_2}\in \mathbb H$.
What about $\mathrm{SL}_2(\mathbb Z)$?
When do $(\tau_1,\tau_2)$ and $(\tau_1',\tau_2')$ define the same lattice? Exactly when
$$(\tau_1',\tau_2')=(a\tau_1+b\tau_2,c\tau_1+d\tau_2)$$where $\begin{pmatrix}a&b\\c&d\end{pmatrix}\in\mathrm{SL}_2(\mathbb Z)$. Hence, if we want to consider functions on lattices, they had better be invariant under $\mathrm{SL}_2(\mathbb Z)$.
Functions on lattices:
Suppose we have a function $$F:\{\text{Lattices}\}\to\mathbb C.$$ First observe that multiplying a lattice by a non-zero scalar (i.e. $\lambda\Lambda$ for $\lambda\in\mathbb C^\times$) amounts to rotating and rescaling the lattice. So our function shouldn't do anything crazy to rescaled lattices.
In fact, since we really care about elliptic curves, and $\mathbb C/\Lambda\cong\mathbb C/\lambda\Lambda$ under the isomorphism $z\mapsto \lambda z$, $F$ should be completely invariant under such rescalings - i.e. we should insist that
$$F(\lambda \Lambda) = F(\Lambda).$$
However, if we define $F$ like this, we are forced to insist that $F$ has no poles. This is needlessly restrictive. So what we do instead is require that
$$F(\lambda\Lambda) = \lambda^{-k}F(\Lambda)$$
for some integer $k$; the quotient $F/G$ of two weight $k$ functions gives a fully invariant function, this time with poles allowed.
Where do modular forms come in?
If $\Lambda = \mathbb Z\tau\oplus\mathbb Z$ with $\tau\in\mathbb H$, define a function $f:\mathbb H\to\mathbb C$ by $f(\tau)=F(\Lambda)$. For a general lattice, we have
$$\begin{align}F(\mathbb Z\tau_1\oplus\mathbb Z\tau_2)&=F\left(\tau_2(\mathbb Z({\tau_1}/{\tau_2})\oplus\mathbb Z)\right)\\
&=\tau_2^{-k}f({\tau_1}/{\tau_2})
\end{align}$$
and in particular,
$$\begin{align}f(\tau) &= F(\mathbb Z\tau\oplus\mathbb Z) \\&=F(\mathbb Z(a\tau+b)\oplus\mathbb Z(c\tau+d)) &\text{by }\mathrm{SL}_2(\mathbb Z)\text{ invariance}\\&= (c\tau+d)^{-k} f\left(\frac{a\tau+b}{c\tau+d}\right).\end{align}$$
This answers your first two questions.
At this point, there's no reason to assume that condition (3) holds, and one can study such functions without assuming condition (3). However, imposing cusp conditions is a useful thing to do, as it ensures that the space of weight $k$ modular forms is finite dimensional.
To answer your fourth question, yes, and this is exactly the viewpoint taken in most research done on modular forms and their generalisations, where one considers automorphic representations.
Best Answer
I will address the 2nd question in your post. The setting is that you have two discrete subgroups $\Gamma_1, \Gamma_2< PSL(2,{\mathbb R})$ acting on the hyperbolic plane ${\mathbb H}$ (the upper half-plane model) so that the quotients have finite area. For instance, finite index subgroups of $PSL(2, {\mathbb Z})$ satisfy this assumption. You are interested in (noncostant, I assume) holomorphic maps $f: {\mathbb H}\to {\mathbb H}$ which are equivariant with respect to some homomorphisms $\phi: \Gamma_1\to \Gamma_2$. For instance, if $|\Gamma_2: \Gamma_1|<\infty$, such $\phi$ is a virtual endomorphism of $\Gamma_2$ in your terminology. You are asking if one can find a certain (finite) subset $\{\gamma_1,...,\gamma_n\}\subset \Gamma_1$ such that the equivariance conditions ("functional equations") $$ \phi(\gamma_i) \circ f= f\circ \gamma_i, i=1,...,n, $$ completely determine $f$. The answer to this is positive. What is important here is not how big is $n$, but, rather, that $\gamma_1,...,\gamma_n$ generate the group $\Gamma_1$. Then the values $\phi(\gamma_i), i=1,...,n$ completely determine the homomorphism $\phi$. The question, thus, reduces to:
Theorem. Suppose that $f_1, f_2$ are two nonconstant holomorphic functions as above, equivariant with respect to the same homomorphism $\phi$. Then $f_1=f_2$.
Proof. Fir of all, observe that a $\phi$-equivariant holomorphic map $f$ descends to a holomorphic map $F: \Sigma_1\to \Sigma_2$ of the quotient Riemann surfaces (actually, orbifolds), $\Sigma_k={\mathbb H}/\Gamma_k, k=1,2$. With a careful choice of base-points $x_k$ in the surfaces, the map $F$ induces the homomorphism $$ \pi_1(\Sigma_1, x_1)\to \pi_1(\Sigma_2, x_2) $$ which equals to $\phi$ under the isomorphism of the respective orbifold fundamental groups and the groups $\Gamma_1, \Gamma_2$. I am not sure if you are familiar with stacks/orbifolds, but there is a trick which allows one to work with ordinary Riemann surfaces: There are finite index torsion-free subgroups $\Gamma_k'< \Gamma_k, k=1, 2$ such that $\phi(\Gamma'_1)< \Gamma'_2$. Then we can work with the pair of subgroups $\Gamma_1', \Gamma_2'$ instead of the original groups and avoid dealing with stacks/orbifolds. Thus, let me assume from the beginning that the groups $\Gamma_1, \Gamma_2$ are torsion-free. Then we can simply use the standard tools of algebraic topology and talks about ordinary fundamental groups.
Remark. It is not important at this point, but choosing deeper finite index subgroups, we can ensure that both $\Sigma_1, \Sigma_2$ have genus $\ge 2$.
What we have, now are two nonconstant holomorphic maps $F_1, F_2: \Sigma_1\to \Sigma_2$ inducing the same homomorphism of fundamental groups. I claim that the maps $F_1, F_2$ are equal. This is very classical in the case when surfaces $\Sigma_1, \Sigma_2$ are compact, of genus $\ge 2$: The reference I know is
Martens, Henrik H., Observations on morphisms of closed Riemann surfaces, Bull. Lond. Math. Soc. 10, 209-212 (1978). ZBL0384.14008.
although, surely, this was known before. Martens notes that even the induced homomorphism of 1st homology groups determines the holomorphic map. Our surfaces need not be compact. However, each finite area hyperbolic Riemann surface $\Sigma$ admits a completion $\Sigma\hookrightarrow \hat\Sigma$ to a compact Riemann surface $\hat\Sigma$ by "filling in the punctures". More formally, $\hat\Sigma$ is a compact Riemann surface, $\Sigma\hookrightarrow \hat\Sigma$ is a holomorphic embedding whose image is dense in $\hat\Sigma$: the complement $\hat\Sigma \setminus \Sigma$ is finite.
One proves, using the Riemann extension theorem,
Lemma. Suppose that $\Sigma_1, \Sigma_2$ are hyperbolic Riemann surfaces and $\Sigma_1$ has finite area. Suppose that $F: \Sigma_1\to \Sigma_2$ is a nonconstant holomorphic map. Then the surface $\Sigma_2$ also has finite area and $F$ admits a holomorphic extension $\hat{F}: \hat\Sigma_1\to \hat\Sigma_2$. (This, of course, is false for non-hyperbolic Riemann surfaces.)
Now, back to our setting: We have two holomorphic maps $F_1, F_2: \Sigma_1\to \Sigma_2$, where $\Sigma_1, \Sigma_2$ are hyperbolic Riemann surfaces of finite area and genus $\ge 2$, such that $F_1, F_2$ induce the same homomorphism of fundamental groups. One next observes that the holomorphic extensions $$ \hat{F}_k: \hat\Sigma_1\to \hat\Sigma_2, k=1,2$$ still induce the same homomorphism of fundamental groups. Moreover, in view of the genus condition, the compact Riemann surfaces $\hat\Sigma_1, \hat\Sigma_2$ are of hyperbolic type. Hence, as noted above, $\hat{F}_1=\hat{F}_2$, implying that $F_1=F_2$.
We are almost done, as we need to prove the equality $f_1=f_2$. What we get from the above is that there exists $\gamma\in \Gamma_2$ such that $f_2=\gamma\circ f_1$. The fact that $f_1, f_2$ are both $\phi$-equivariant, implies that $$ Inn_\gamma\circ \phi=\phi, $$ where $$ Inn_\gamma: \alpha\mapsto \gamma \alpha \gamma^{-1}, $$ the inner automorphism of $\Gamma_2$ defined by $\gamma$. It then follows that $\gamma$ commutes with all elements of $\Lambda=\phi(\Gamma_1)$. However, the centralizer of a discrete subgroup $\Lambda$ of $PSL(2, {\mathbb R})$ is trivial unless $\Lambda$ is cyclic. In our case, $\Lambda$ cannot be cyclic for the following reason: The nonconstant holomorphic map $h=f_1$ descends to a nonconstant holomorphic map $H: {\mathbb H}/\Gamma_1\to {\mathbb H}/\Lambda$. If $\Lambda$ were cyclic, the area of ${\mathbb H}/\Lambda$ would be finite. But this contradicts Lemma above. Thus, we conclude that $\gamma=1$ and, hence, $f_1=f_2$.
Lastly, there is a vast area of complex analysis that studies the following pairs $(f,\phi)$: $\Gamma< PSL(2, {\mathbb R})$ is a discrete subgroup, $\phi: \Gamma\to PSL(2, {\mathbb C})$ is a homomorphism and $f: {\mathbb H}\to CP^1$ is a $\phi$-equivariant holomorphic map. Take a look
Hejhal, Dennis A., Monodromy groups and linearly polymorphic functions, Acta Math. 135, 1-55 (1975). ZBL0333.34002.
This paper and its bibliography are dated, but it gives you some idea about the area. Here is one classical result (from 1920s), related to Theorem above:
Suppose, in the above setting, that the surface $\Sigma={\mathbb H}/\Gamma$ is compact and the map $f: {\mathbb H}\to CP^1$ is locally univalent (i.e. has nowhere vanishing derivative). Then $\phi$ completely determines $f$.
I am quite sure that Theorem proven earlier is also classically known, but it is easier to give a proof than to find a reference.