Perhaps the following will be of use. In "Levy Processes and Stochastic Calculus" by David Applebaum, Theorem 6.7.4, the following is argued: Consider a SDE of the form
$$
dY_t = b(Y_{t-})dt + \sigma(Y_{t-})dW_t
+ \int_{|x|<c} F(Y_{t-},x) \tilde{N}(dt,dx)
+ \int_{|x|\ge c} G(Y_{t-},x) N(dt,dx),$$
where $W$ is a $n$-dimensional Brownian motion, $N$ a $n$-dimensional Poisson random measure with intensity measure $\nu$ and compensator $\tilde{N}$, independent of $W$, and coefficients satisfying appropriate regularity conditions. Then this SDE has a unique solution, which is a Feller process with generator $A$ whose value on $C_c^2(\mathbb{R}^n)$ is
$$
Af(y) = \sum_{i=1}^n b_i(y)\frac{\partial}{\partial y_i}f(y)
+ \frac{1}{2}\sum_{i=1}^n \sum_{j=1}^n (\sigma(y)\sigma(y)^t)_{ij}\frac{\partial^2}{\partial y_i\partial y_j}f(y)\\
+\int_{|x|\le c} f(y+F(y,x))-f(y) - F_i(y,x)\frac{\partial}{\partial x_i}f(y) d\nu(x)\\
+\int_{|x|\ge c}f(y+G(y,x)) - f(y) d \nu(x).
$$
The only difference with your set-up appears to be that you have a nondeterministic intensity measure $\nu$.
For a Markov process $(X_t)_{t \geq 0}$ we define the generator $A$ by
$$Af(x) := \lim_{t \downarrow 0} \frac{\mathbb{E}^x(f(X_t))-f(x)}{t} = \lim_{t \downarrow 0} \frac{P_tf(x)-f(x)}{t}$$
whenever the limit exists in $(C_{\infty},\|\cdot\|_{\infty})$. Here $P_tf(x) := \mathbb{E}^xf(X_t)$ denotes the semigroup of $(X_t)_{t \geq 0}$.
By Taylor's formula this means that
$$\mathbb{E}^xf(X_t) \approx f(x)+t Af(x)$$
for small $t \geq 0$. So, basically, the generator describes the movement of the process in an infinitesimal time interval. One can show that
$$\frac{d}{dt} P_t f(x) = A P_tf(x), \tag{1}$$
i.e. the generator is the time derivative of the mapping $t \mapsto P_tf(x)=\mathbb{E}^x(f(X_t))$. Reading $(1)$ as a (partial) differential equation we see that $u(t,x) := P_t f(x)$ is a solution to the PDE
$$\frac{\partial}{\partial t} u(t,x) = Au(t,x) \qquad u(0,x)=f(x).$$
This is one important reason why generators are of interest. Another, more probabilistic, reason is that the process
$$M_t^f := f(X_t) - f(X_0)- \int_0^t Af(X_s) \, ds, \qquad t \geq 0 \tag{2}$$
is a martingale. This means that we can associate with $(X_t)_{t \geq 0}$ a whole bunch of martingales, and this martingale property comes in handy very often, for example whenenver we deal with expectations of the form $\mathbb{E}^x(f(X_t))$. This leads to Dynkin's formula.
Generators are also connected with the martingale problem which in turn can be used to characterize (weak) solutions of stochastic differential equations. Futhermore, generators of stochastic processes are strongly related to Dirichlet forms and Carré du champ operators; it turns out that they are extremely helpful to carry over results from probability theory to analysis (and vica versa). One important application are heat-kernel estimates.
Example: Brownian motion In the case of (one-dimensional) Brownian motion $(B_t)_{t \geq 0}$, we see that
$$\mathbb{E}^x(f(B_t)) \approx f(x)+ \frac{t}{2} f''(x)$$
for small $t$. This formula can be motivated by Taylor's formula: Indeed,
$$\mathbb{E}^x(f(B_t)) \approx \mathbb{E}^x \left[f(x)+f'(x)(B_t-x)+\frac{1}{2} f''(x)(B_t-x)^2 \right]= f(x)+0+\frac{t}{2} f''(x)$$
using that $\mathbb{E}^x(B_t-x)=0$ and $\mathbb{E}^x((B_t-x)^2)=t$.
From $(1)$ we see that $u(t,x) := \mathbb{E}^x(f(B_t))$ is the (unique) solution of the heat equation
$$\partial_t u(t,x) = \frac{1}{2}\partial_x^2 u(t,x) \qquad u(0,x)=f(x).$$
Moreover, one can show that the solution of the Dirichlet problem is also related to the Brownian motion. Furthermore, $(2)$ yields that
$$M_t^f := f(B_t)-f(B_0) - \frac{1}{2} \int_0^t f''(B_s) \, ds.$$
is a martingale. Having Itô's formula in mind, this is not surprising since
$$f(B_t)-f(B_0) = \int_0^t f'(B_s) \, dB_s+ \frac{1}{2} \int_0^t f''(B_s) \,ds = M_t^f + \frac{1}{2} \int_0^t f''(B_s) \,ds.$$
The above-mentioned results (and proofs thereof) can be found in the monograph Brownian Motion - An Introduction to Stochastic Processes by René L. Schilling & Lothar Partzsch.
Best Answer
I have looked over Oksendal and Sulem. I don't see a derivative with respect to $x$ in the jump component (that is multiplied by $\lambda$). Moreover, it is not difficult to conjecture the result using a Taylor series expansion of $E[f(t,X(t)) - f(0,X(0))]$, and I do not see why we would get the additional derivative term multiplied by $\lambda$.
I believe there is an error in your answer. The source: the model in Theorem 1.22 of Oksendal and Sulem uses a centered Poisson process (normalized to be mean zero). Without the centering, you won't have the additional first-order derivative term.
So, if I understand the question, here is the answer: $$ \mathcal{A}f(t,x) = \frac{\partial f}{\partial t}(t,x) + \mu x\frac{\partial f}{\partial x}(t,x) + \frac{1}{2}\sigma^2 x^2 \frac{\partial^{2} f}{\partial x^{2}}(t,x) + \lambda [f((1+u) x) - f(x) ] $$
where $\lambda$ denotes the rate parameter for the Poisson process.
Intuition: with rate $\lambda$, the process takes a jump from $X_t$ to $(1+u)X_t$.