Let's take for granted the Gaussian integration formula, which holds for both bosonic and fermionic integrals, if they are properly interpreted:
Theoreom (Gauss, Wick): Let $X$ be a vector space with a chosen volume form ${\rm d}x$, $f: X \to \mathbb C$ a polynomial, and $a: X^{\vee 2} \to \mathbb C$ a symmetric bilinear form with inverse $a^{-1}\in X^{\vee 2}$ such that the Gaussian measure $\exp(-\frac12 a\cdot x^{\otimes 2}){\rm d}x$ is defined. (So for example $X$ can be bosonic finite-dimensional and $a$ can have positive-definite real part; or $a$ can be invertible pure-imaginary and all integrals can be taken as conditionally convergent; or $X$ can have even-dimensional fermionic parts and the integral can be defined a la Berezin.) Then we have
$$ \int_X \sum f^{(n)} \cdot \frac{x^{\otimes n}}{n!} \exp \left(-\frac12 a\cdot x^{\otimes 2}\right){\rm d}x = \sqrt{\det(2\pi a)} \sum f^{(2k)} \cdot \frac{(a^{-1})^{\otimes 2k}}{2^kk!} $$
Or, anyway, when $X$ is finite-dimensional Bosonic this is correct. In the fermionic case, it's off by some $\sqrt{-2\pi}$s, and $\det$ is Berezin's superdeterminant. Here $f^{(n)} : X^{\vee n} \to \mathbb C$ is the $n$th Taylor coefficient of $f$ at $0$; it's a symmetric tensor. In the fermionic case, some care must be taken with words like "symmetric", but I will ignore this subtlety.
Proof: Integrate by parts. $\Box$
Now the trick is to interpret the RHS combinatorially: you should draw each summand as a graph with one vertex, labeled $f^{(2k)}$, and $k$ self-loops, labeled $a^{-1}$; then if you count automorphisms correctly, the denominator of the summand is the number of automorphisms of the graph, and the numerator is the "evaluation" of the graph as a picture of tensor contractions. You can also draw the left hand summands combinatorially: a vertex with $n$ incoming strands corresponds to $f^{(n)}$, and the $n!$ counts the symmetries.
What would have happened if you had not just a single polynomial but a product? You can draw $\frac1{m!} f^{(m)} \otimes \frac1{n!} g^{(n)}$ as two vertices, one labeled $f$ with $m$ incoming strands and the other labeled $g$ with $n$ incoming strands, and the symmetries are correct, and using $\frac1{m!} f^{(m)} \otimes \frac1{n!} g^{(n)} = \binom{m+n}{m,n} (f^{(n)}\otimes g^{(m)})$ and the above theorem, the RHS now can be taken as a sum over all graphs (possibly disconnected) with two vertices, one labeled $f$ and the other labeled $g$, where the edges are still valued $a^{-1}$ and a labeled graph is weighted by its automorphism group.
Ok, so now let's try to calculate asymptotics of integrals, and I will henceforth ignore the "determinant" prefactors. My domain of integration will always be a "small" neighborhood of $0$. I'm interested in the $\hbar \to 0$ asymptotics of
$$ \int_X \exp \frac1\hbar\left( - a\cdot \frac{x^{\otimes 2}}2 + \sum_{n\geq 3} b^{(n)}\cdot \frac{x^{\otimes n}}{n!} \right) {\rm d}x $$
First rescale $x \mapsto \sqrt\hbar x$; this just rescales the overall integral by $\hbar^{\dim X/2}$, and I'm dropping those terms. Then the integral is $\exp\bigl( - a\cdot \frac{x^{\otimes 2}}2 + O(\hbar) \bigr)$, so keep the $O(1)$ part in the exponent but expand the $O(\hbar)$ part out:
$$ = \int_X {\rm d}x \exp \left(-\frac12 a\cdot x^{\otimes 2}\right) \times \sum_{m\geq 0} \frac1{m!} \left( \sum_{n\geq 3} b^{(n)}\cdot \frac{x^{\otimes n}}{n!} \right)^m$$
Expanding the sum further, the $b$s still look like vertices with $n\geq 3$ incoming strands (and $n!$ symmetries), but now I get to have $m$ many of them, weighted by the $m!$ symmetries from permuting the vertices. So the sum is over all collections of trivalent-and-higher vertices, counted with symmetry, and an $n$-valent vertex is valued $\hbar^{\frac n 2 - 1}b^{(n)}$.
Then we integrate by connecting them up. All together, we get:
$$ = \sum_{\text{graphs } \Gamma } \frac{ \hbar^{\#} \operatorname{ev} (\Gamma) }{ \lvert \operatorname{Aut} \Gamma \rvert } $$
Graphs can be disconnected, empty, have parallel edges and self loops, etc. We compute $\operatorname{ev}(\Gamma)$ by assigning $a^{-1}$ to each edge, $b^{(n)}$ to an $n$-valent vertex, and contracting tensors. The power on $\hbar$ is $-1$ for each vertex, and $\frac12$ for each half-edge (each part of an edge that arrives at a vertex), i.e. it's $-1$ for each vertex, $+1$ for each edge, i.e. it's the negative of the Euler characteristic of the graph.
Finally, let me get to the version from Getzler-Kapranov. Above, I used the fact that if $\star$ is a sum of connected things (counted with symmetry), then $\exp(\star)$ is a sum (counted with symmetry) over all possible collections of disjoint copies of $\star$. We've ended up with a sum-with-symmetry over disjoint things. Taking $\log$ gives the sum of connected things. For a connect graph, the negative Euler characteristic is precisely one less than the first Betti number.
Exercise: Redo the above calculations with $O(\hbar)$ corrections to the exponent of the initial integral, to end up with the precise Getzler-Kapranov result.
Finally, I should say one more thing about Feynman diagrams. Feynman and Dyson disagreed about the meaning of Feynman's diagrams: Feynman thought of them as pictures of particles interacting, and Dyson thought of them they way I do above: as graphs computing the asymptotics of integrals.
The point is the following. The most important integrals like those above that physicists care about come from particularly nice quantum field theories, where $X$ is a space of sections of some vector bundle (with some extra structure) on a Riemannian manifold, and $a$ is the Laplacian, and the $b^{(n)}$ are all "local". In this situation, $a^{-1}$ computes the "heat flow" for the Riemannian manifold, which is nothing more nor less than an integral over all paths connecting two endpoints of some "amplitude for a free particle to travel along this path". Then $\operatorname{ev}(\Gamma)$ can be interpreted as an integral over all embeddings of $\Gamma$ into your manifold of: the amplitude for your particle to travel along the edges, times an amplitude for an "interaction" at the vertices.
One good reference is the first chapter or two of Costello's recent book on QFT.
The Dedekind eta function shows up in three-dimensional quantum gravity: http://arxiv.org/abs/0712.0155 (Alexander Maloney, Edward Witten, Quantum Gravity Partition Functions in Three Dimensions). On page 17 a basic partition function $Z_{0,1}$ of the theory is calculated as $$Z_{0,1}(\tau)=\frac{1}{|\eta(\tau)|^2}|\bar q q|^{-(k-1/24)}|1-q|^2.$$ It also appears in the calculation of supergravity partition functions in sec.7.
The Dedekind eta function also enters in (supersymmetric) physics through mock modular forms: http://arxiv.org/abs/1208.4074 (Atish Dabholkar, Sameer Murthy, Don Zagier, Quantum Black Holes, Wall Crossing, and Mock Modular Forms).
A good review of mock modular forms is http://mathcs.emory.edu/~ono/publications-cv/pdfs/114.pdf (Ken Ono, Unearthing the visions of a master: harmonic Maass forms and number theory).
I first heared about Dedekind Eta Function in the physics context via Freeman Dyson's lovely essay "Missed opportunities": http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.bams/1183533964
Best Answer
Vertex algebras precisely model the structure of "holomorphic one-dimensional algebra" -- in other words, the algebraic structure that you get if you try to formalize the idea of operators (elements of your algebra) living at points of a Riemann surface, and get multiplied when you collide.
Our geometric understanding of how to formalize this idea has I think improved dramatically over the years with crucial steps being given by the point of view of "factorization algebras" by Beilinson and Drinfeld, which is explained (among other places :-) ) in the last chapter of my book with Edward Frenkel, "Vertex algebras and algebraic curves" (second edition only). This formalism gives a great way to understand the algebraic structure of local operators in general quantum field theories -- as is seen in the recent work of Kevin Costello -- or in topological field theory, where it appears eg in the work of Jacob Lurie (in particular the notion of "topological chiral homology").
In fact I now think the best way to understand a vertex algebra is to first really understand its topological analog, the structure of local operators in 2d topological field theory. If you check out any article about topological field theory it will explain that in a 2d TFT, we assign a vector space to the circle, it obtains a multiplication given by the pair of pants, and this multiplication is commutative and associative (and in fact a commutative Frobenius algebra, but I'll ignore that aspect here). It's very helpful to picture the pair of pants not traditionally but as a big disc with two small discs cut out -- that way you can see the commutativity easily, and also that if you think of those discs as small (after all everything is topologically invariant) you realize you're really describing operators labeled by points (local operators in physics, which we insert around a point) and the multiplication is given by their collision (ie zoom out the picture, the two small discs blend and look like one disc, so you've started with two operators and gotten a third).
Now you say, come on, commutative algebras are SO much simpler than vertex algebras, how is this a useful toy model? well think about where else you've seen the same picture -- more precisely let's change the discs to squares. Then you realize this is precisely the picture given in any topology book as to why $\pi_2$ of a space is commutative (move the squares around each other). So you get a great model for a 2d TFT by thinking about some pointed topological space X.. to every disc I'll assign maps from that disc to X which send the boundary to the basepoint (ie the double based loops of X), and multiplication is composition of loops -- i.e. $\Omega^2 X$ has a multiplication which is homotopy commutative (on homotopy groups you get the abelian group structure of $\pi_2$). In homotopy theory this algebraic structure on two-fold loops is called an $E_2$ structure.
My claim is thinking about $E_2$ algebras is a wonderful toy model for vertex algebras that captures all the key structures. If we think of just the mildest generalization of our TFT story, and assign a GRADED vector space to the circle, and keep track of homotopies (ie think of passing from $\Omega^2 X$ to its chains) we find not just a commutative multiplication of degree zero, but a Lie bracket of degree one, coming from $H^1$ of the space of pairs of discs inside a bigger disc (ie from taking a "residue" as one operator circles another). This is in fact what's called a Gerstenhaber algebra (aka $E_2$ graded vector space). Now all of a sudden you see why people say you can think of vertex algebras as analogs of either commutative or Lie algebra (they have a "Jacobi identity") -- -the same structure is there already in topological field theory, where we require everything in sight to depend only on the topology of our surfaces, not the more subtle conformal geometry.
Anyway this is getting long - to summarize, a vertex algebra is the holomorphic refinement of an $E_2$ algebra, aka a "vector space with the algebraic structure inherent in a double loop space", where we allow holomorphic (rather than locally constant or up-to-homotopy) dependence on coordinates.
AND we get perhaps the most important example of a vertex algebra--- take $X$ in the above story to be $BG$, the classifying space of a group $G$. Then $\Omega^2 X=\Omega G$ is the "affine Grassmannian" for $G$, which we now realize "is" a vertex algebra.. by linearizing this space (taking delta functions supported at the identity) we recover the Kac-Moody vertex algebra (as is explained again in my book with Frenkel).