Consider $\phi^4$ theory:
$$
\mathcal L=\frac12 Z_1(\partial\phi)^2-\frac12 Z_m m^2\phi^2-\frac{1}{4!}\lambda_0\phi^4
$$
There are two approaches to perturbation theory:
First
The propagator is given by
$$
\Delta=\frac{1}{Z_1p^2-Z_m m^2}
$$
and there is one type of vertex, with value
$$
-i\lambda_0
$$
Second
The propagator is given by
$$
\Delta=\frac{1}{p^2-m^2}
$$
and there are two types of vertices, with value
$$
-i((Z_1-1)p^2-(Z_m-1)m^2),\qquad -i\lambda_0
$$
The two approaches are completely equivalent, and they give rise to the exact same expression for a given scattering process.
Note that the coefficients $Z_1,Z_m$ depend on the expansion parameter $\lambda$. This means that the first approach is more cumbersome because it is in general not clear which diagrams contribute to a given order in perturbation theory, inasmuch as both the vertices and the propagators contain powers of $\lambda$. On the other hand, the second approach leads to more diagrams (because there is one more vertex) but it is more convenient (because the propagators are independent of $\lambda$).
Feynman rules by functional derivatives
It is not in general, it just coincides that way for polynomials of fields, without any derivatives or other complications. In truth, one takes functional derivatives until no field is left.
For example, schematically,
$$-i\frac{\delta^4}{\delta \phi^4} \frac{\lambda}{4!} \phi^4 \to -i\lambda$$
which is the entire reason for the factor of $4!$ - it is a convenient convention, but not a necessary coefficient and the physics remains the same without it. More complexly, we could have,
$$-i\frac{\delta^2}{\delta \phi^2} \frac{\delta}{\delta A_\mu}g\phi A^\nu \partial_\nu \phi \to -ig(p_1^\mu + p_2^\mu)$$
which describes a vector coupling to a scalar, where $p_1$ and $p_2$ would be momenta labelling two of the legs attached to the $A\phi\phi$ vertex.
Quadratic terms
The kinetic and mass term is,
$$\mathcal L = \frac12 (\partial \phi)^2 - \frac12 m^2 \phi^2.$$
In Fourier space, $\partial_\mu \phi \to ip_\mu \phi$ and so we have $(\partial \phi)^2$ must go as $p^2 \phi^2$. Interpreting the inverse in Fourier space as the multiplicative inverse, we then have that the propagator goes as,
$$\Delta \sim \frac{1}{p^2+m^2}.$$
Note that for the counterterm Lagrangian (for when you move on to renormalisation), the kinetic and mass counterterms are typically interpreted as interactions, and thus functional derivatives are taken, but no inversion is performed. However, this is a matter of choice, one could absorb coefficients into the propagator instead - they lead to the same physics.
Green's function
Note that a propagator - other than being the inverse of the quadratic terms - can also be interpreted as the Green's function of the equations of motion. This is a function which can be used to solve the equations, via,
$$\phi(x) = \int \mathrm dx' \, G(x,x') f(x')$$
where $(\square + m^2)\phi(x) = f(x)$. Conceptually, in the same way we can think of a function as being built of delta functions, we can think of a solution built up as Green's functions since,
$$(\square + m^2)G(x) \sim \delta^{(n)}(x).$$
Best Answer
Thee diverging terms for the propagator come from the renormalization of the self-energy $\Sigma$, defined by $G^{-1}=G_0^{-1}-\Sigma$, where $G_0$ is the propagator defined by the Lagrangian (i.e. bare propagator + counterterms) :
$G^{-1}_0=(1+\delta Z)p^2+(m^2_0+\delta m^2)$.
One chooses the counterterms to cancel the divergences coming from $\Sigma$ order by order. If the theory is perturbatively renormalizable, only these two counterterms are sufficient at every order in perturbation theory.