Could someone please convince me that there is something natural about
the choice of the Lagrangian formulation...
If I ask a high school physics student, "I am swinging a ball on a string around my head in a circle. The string is cut. Which way does the ball go?", they will probably tell me that the ball goes straight out - along the direction the string was pointing when it was cut. This is not right; the ball actually goes along a tangent to the circle, not a radius. But the beginning student will probably think this is not natural. How do they lose this instinct? Probably not by one super-awesome explanation. Instead, it's by analyzing more problems, seeing the principles applied in new situations, learning to apply those principles themselves, and gradually, over the course of months or years, building what an undergraduate student considers to be common intuition.
So my guess is no, no one can convince you that the Lagrangian formulation is natural. You will be convinced of that as you continue to study more physics, and if you expect to be convinced of it all at once, you are going to be disappointed. It is enough for now that you understand what you've been taught, and it's good that you're thinking about it. But I doubt anyone can quickly change your mind. You'll have to change it for yourself over time.
That being said, I think the most intuitive way to approach action principles is through the principle of least (i.e. stationary) time in optics. Try Feynman's QED, which gives a good reason to believe that the principle of stationary time is quite natural.
You can go further mathematically by learning the path integral formulation of nonrelativistic quantum mechanics and seeing how it leads to high probability for paths of stationary action.
More importantly, just use Lagrangian mechanics as much as possible, and not just finding equations of motion for twenty different systems. Use it to do interesting things. Learn how to see the relationship between symmetries and conservation laws in the Lagrangian approach. Learn about relativity. Learn how to derive electromagnetism from an action principle - first by studying the Lagrangian for a particle in an electromagnetic field, then by studying the electromagnetic field itself as described by a Lagrange density. Try to explain it to someone - their questions will sharpen your understanding. Check out Leonard Susskind's lectures on YouTube (series 1 and 3 especially). They are the most intuitive source I know for this material.
Read some of the many questions here in the Lagrangian or Noether tags. See if you can figure out their answers, then read the answers people have provided to compare.
If you thought that the Lagrangian approach was wrong, then you might want someone to convince you otherwise. But if you just don't feel comfortable with it yet, you'd be robbing yourself of a great pleasure by not taking the time to learn its intricacies.
Finally, your question is very similar to this one, so check out the answers there as well.
Perhaps a simple example is in order. Consider a simple harmonic oscillator (SHO)
$$ S~=~\int_{t_i}^{t_f} \! dt~L, \qquad
L~=~\frac{m}{2}\dot{x}^2 - \frac{k}{2}x^2, \tag{1}$$
with characteristic frequency
$$ \frac{2\pi}{T}~=~\omega~=~\sqrt{\frac{k}{m}}, \tag{2}$$
and Dirichlet boundary conditions
$$ x(t_i)~=~x_i \quad\text{and}\quad x(t_f)~=~x_f. \tag{3}$$
It can be shown that the classical path is only a minimum for the action (1) if the time period
$$ \Delta t~:=~t_f-t_i~\leq~ \frac{T}{2}\tag{4}$$
is smaller than a characteristic time scale $\frac{T}{2}$ of the problem. (If $\Delta t=\frac{T}{2}$ there is a zeromode.) For $\Delta t>\frac{T}{2}$ the classical path is no longer a minimum for the action (1), but only a saddle point. If we consider bigger and bigger $\Delta t$, a new negative mode/direction develops/appears each time $\Delta t$ crosses a multiple of $\frac{T}{2}$.
It is such examples that Ref. 1. has in mind when saying that the principle of least action is actually a principle of stationary action. The above phenomenon is quite general, and related to conjugated points/turning points and Morse theory. In semiclassical expansion of quantum mechanics, this behaviour affects the metaplectic correction/Maslov index. See e.g. Ref. 2 for further details.
A similar phenomenon takes place in geometrical optics, where it is straightforward to construct examples of light paths that do not minimize the time, cf. Fermat's principle of least time.
References:
Landau and Lifshitz, Vol.2, The Classical Theory of Fields, p. 24.
W. Dittrich and M. Reuter, Classical and Quantum Dynamics, 1992, Chapter 3.
Best Answer
I) OP's questions about the variational principle (e.g. the derivation to and from Newton's laws) are good and interesting questions, but most have been asks before, see e.g. this and this Phys.SE posts and links therein. Lagrangians not of the form Kinetic Energy minus Potential Energy are discussed in this Phys.SE post. The question of whether (or not) a set of equations of motion has an action principle is a duplicate of this Phys.SE post and links therein. Another possible helpful Phys.SE post concerning calculus of variation is this one.
II) The only new ingredient in OP's question seems to be the finite-difference method. Gelfand's use of the finite-difference method seems mostly to be an alternative way to gain insight into the Euler-Lagrange formula for the functional derivative via discretization. It doesn't add anything new to the continuum theory.
For instance, let us for simplicity consider point mechanics (as opposed to field theory). A continuum formulation for point mechanics here refers to a continuous time variable $t$ (as opposed to a discrete time variable). Let there be given a continuum Lagrangian $L(q,v,t)$ that depends on three arguments: position $q$, velocity $v$ and time $t$. Let $\frac{\partial L}{\partial q}$ and $\frac{\partial L}{\partial v}$ denote the derivative wrt. the first and second argument, respectively, and so forth.
Let us approximate the continuum action functional
$$\tag{1} S[q]~:=~\int_{0}^{T} \!\! dt~L(q(t),\dot{q}(t),t)$$
via the following discretized expression
$$\tag{2} S(q_0, \ldots q_{N})~:=~\Delta t\sum_{n=0}^{N-1} L(q_n,v_n,t_n), $$
where
$$\tag{3} \Delta t~:=~ \frac{T}{N}, \qquad t_n ~:= n \Delta t, \qquad q_n~:=~q(t_n),\qquad v_n~:=~\frac{q_{n+1}-q_n}{\Delta t}.$$
In eq. (3) we have chosen the forward difference for $v_n$. Obviously other choices are possible. Then
$$\frac{1}{\Delta t}\frac{\partial S(q_0, \ldots q_{N})}{\partial q_n} $$ $$\tag{4}~=~ \frac{\partial L(q_n,v_n,t_n)}{\partial q} -\frac{1}{\Delta t}\left[\frac{\partial L(q_n,v_n,t_n)}{\partial v}-\frac{\partial L(q_{n-1},v_{n-1},t_{n-1})}{\partial v} \right], $$
where $n\in\{1,2, \ldots,N-2, N-1\}$. Eq. (4) is a discretized version of the Euler-Lagrange formula
$$\tag{5} \frac{\delta S[q]}{\delta q (t)}~=~\frac{\partial L(q(t),\dot{q}(t),t)}{\partial q(t)} - \frac{d}{dt} \frac{\partial L(q(t),\dot{q}(t),t)}{\partial \dot{q}(t)} $$
for the functional derivative$^1$. The functional derivative (5) satisfies by definition the following infinitesimal variation formula
$$\tag{6} \delta S[q] ~:=~S[q+\delta q]- S[q]~=~ \int_{0}^{T} \!\! dt~\frac{\delta S[q]}{\delta q (t)}\delta q(t). $$
--
$^1$ The existence of functional derivative depends on the appropriately imposed boundary conditions, which we have not discussed. In plain English: We need to integrate by part at some point.