For a Markov process $(X_t)_{t \geq 0}$ we define the generator $A$ by
$$Af(x) := \lim_{t \downarrow 0} \frac{\mathbb{E}^x(f(X_t))-f(x)}{t} = \lim_{t \downarrow 0} \frac{P_tf(x)-f(x)}{t}$$
whenever the limit exists in $(C_{\infty},\|\cdot\|_{\infty})$. Here $P_tf(x) := \mathbb{E}^xf(X_t)$ denotes the semigroup of $(X_t)_{t \geq 0}$.
By Taylor's formula this means that
$$\mathbb{E}^xf(X_t) \approx f(x)+t Af(x)$$
for small $t \geq 0$. So, basically, the generator describes the movement of the process in an infinitesimal time interval. One can show that
$$\frac{d}{dt} P_t f(x) = A P_tf(x), \tag{1}$$
i.e. the generator is the time derivative of the mapping $t \mapsto P_tf(x)=\mathbb{E}^x(f(X_t))$. Reading $(1)$ as a (partial) differential equation we see that $u(t,x) := P_t f(x)$ is a solution to the PDE
$$\frac{\partial}{\partial t} u(t,x) = Au(t,x) \qquad u(0,x)=f(x).$$
This is one important reason why generators are of interest. Another, more probabilistic, reason is that the process
$$M_t^f := f(X_t) - f(X_0)- \int_0^t Af(X_s) \, ds, \qquad t \geq 0 \tag{2}$$
is a martingale. This means that we can associate with $(X_t)_{t \geq 0}$ a whole bunch of martingales, and this martingale property comes in handy very often, for example whenenver we deal with expectations of the form $\mathbb{E}^x(f(X_t))$. This leads to Dynkin's formula.
Generators are also connected with the martingale problem which in turn can be used to characterize (weak) solutions of stochastic differential equations. Futhermore, generators of stochastic processes are strongly related to Dirichlet forms and Carré du champ operators; it turns out that they are extremely helpful to carry over results from probability theory to analysis (and vica versa). One important application are heat-kernel estimates.
Example: Brownian motion In the case of (one-dimensional) Brownian motion $(B_t)_{t \geq 0}$, we see that
$$\mathbb{E}^x(f(B_t)) \approx f(x)+ \frac{t}{2} f''(x)$$
for small $t$. This formula can be motivated by Taylor's formula: Indeed,
$$\mathbb{E}^x(f(B_t)) \approx \mathbb{E}^x \left[f(x)+f'(x)(B_t-x)+\frac{1}{2} f''(x)(B_t-x)^2 \right]= f(x)+0+\frac{t}{2} f''(x)$$
using that $\mathbb{E}^x(B_t-x)=0$ and $\mathbb{E}^x((B_t-x)^2)=t$.
From $(1)$ we see that $u(t,x) := \mathbb{E}^x(f(B_t))$ is the (unique) solution of the heat equation
$$\partial_t u(t,x) = \frac{1}{2}\partial_x^2 u(t,x) \qquad u(0,x)=f(x).$$
Moreover, one can show that the solution of the Dirichlet problem is also related to the Brownian motion. Furthermore, $(2)$ yields that
$$M_t^f := f(B_t)-f(B_0) - \frac{1}{2} \int_0^t f''(B_s) \, ds.$$
is a martingale. Having Itô's formula in mind, this is not surprising since
$$f(B_t)-f(B_0) = \int_0^t f'(B_s) \, dB_s+ \frac{1}{2} \int_0^t f''(B_s) \,ds = M_t^f + \frac{1}{2} \int_0^t f''(B_s) \,ds.$$
The above-mentioned results (and proofs thereof) can be found in the monograph Brownian Motion - An Introduction to Stochastic Processes by René L. Schilling & Lothar Partzsch.
You can look at your process $X_{t}$ as a two dimensional stochastic process
$$Y_{t}=\left[\begin{array}{cc}X_{t}\\ \eta_{t}\end{array}\right]$$
Then
$$dY_{t}=\left[\begin{array}{cc}dX_{t}\\ d\eta_{t}\end{array}\right]=\left[\begin{array}{cc}b(X_t)+\lambda\eta_{t}\sigma(X_{t})\\ \lambda\eta_{t}\end{array}\right]dt+\left[\begin{array}{cc}\alpha\sigma(X_t)&0\\ 0&\alpha\end{array}\right]\left[\begin{array}{cc}dW_{t}\\ dW_{t}\end{array}\right]$$
and the infinitesimal generator is of the form
$$LV(y)=LV(x,\eta)=\left(b(x)+\lambda\eta \sigma(x)\right)V'_{x}(x,\eta)+\lambda\eta V'_{\eta}(\eta,x)$$
$$+\frac{1}{2}\alpha^{2}\sigma^{2}(x)V''_{xx}(x,\eta)+\alpha^{2}\sigma(x)V''_{x\eta}(x,\eta)+\frac{1}{2}\alpha^{2}V''_{\eta\eta}(x,\eta)$$
By the way, the infinitesimal generator of an Ornstein-Uhlenbeck process of the form
$$d\eta_{t} = \lambda\eta_{t} dt + \alpha dW_{t}$$
is
$$LV(\eta)=\lambda \eta V'(\eta) + \frac{\alpha^2}{2}V''(\eta)$$
Best Answer
(1) The infinitesimal generator is an operator defined on a subspace of the space $C_0$ of continuous functions that vanish at infinity. More precisely, the subspace on which it is defined is $$D(A) = \left \{f \in C_0: \lim_{t \downarrow 0} \frac{\mathbb{E}^x(f(X_t))-f(x)}{t} \text{ exists} \right \}$$ and then $A:D(A) \to C_0$. This means that if $f \in D(A)$ then $Af$ is a function that vanishes at infinity. $A$ is the generator of the process $X$ but this does not mean that for $f \in D(A)$, $Af(X) = AX$. Indeed, you have no reason to believe that $X \in D(A)$ in the first place since in many situations $A$ will turn out to be a differential operator and $X$ need not have differentiable paths.
Finally, if $X$ is your process then $f(X)$ is the process that at a given time $t$ and for a given $\omega$ in your probability space is defined by $f(X)_t(\omega) = f(X_t(\omega))$.
(2) $P_t f(x) = \mathbb{E}^x[f(X_t)]$. This is simply the expected value of $f(X_t)$ is your process is started at $x$ at time $0$.
(3) $f(x)$ is deterministic and so $E^x[f(x)] = f(x) E^x[1] = f(x)$.