I) OP's questions about the variational principle (e.g. the derivation to and from Newton's laws) are good and interesting questions, but most have been asks before, see e.g. this and this Phys.SE posts and links therein. Lagrangians not of the form Kinetic Energy minus Potential Energy are discussed in this Phys.SE post. The question of whether (or not) a set of equations of motion has an action principle is a duplicate of this Phys.SE post and links therein. Another possible helpful Phys.SE post concerning calculus of variation is this one.
II) The only new ingredient in OP's question seems to be the finite-difference method. Gelfand's use of the finite-difference method seems mostly to be an alternative way to gain insight into the Euler-Lagrange formula for the functional derivative via discretization. It doesn't add anything new to the continuum theory.
For instance, let us for simplicity consider point mechanics (as opposed to field theory). A continuum formulation for point mechanics here refers to a continuous time variable $t$ (as opposed to a discrete time variable). Let there be given a continuum Lagrangian $L(q,v,t)$ that depends on three arguments: position $q$, velocity $v$ and time $t$. Let $\frac{\partial L}{\partial q}$ and $\frac{\partial L}{\partial v}$ denote the derivative wrt. the first and second argument, respectively, and so forth.
Let us approximate the continuum action functional
$$\tag{1} S[q]~:=~\int_{0}^{T} \!\! dt~L(q(t),\dot{q}(t),t)$$
via the following discretized expression
$$\tag{2} S(q_0, \ldots q_{N})~:=~\Delta t\sum_{n=0}^{N-1} L(q_n,v_n,t_n), $$
where
$$\tag{3} \Delta t~:=~ \frac{T}{N}, \qquad t_n ~:= n \Delta t, \qquad q_n~:=~q(t_n),\qquad v_n~:=~\frac{q_{n+1}-q_n}{\Delta t}.$$
In eq. (3) we have chosen the forward difference for $v_n$. Obviously other choices are possible. Then
$$\frac{1}{\Delta t}\frac{\partial S(q_0, \ldots q_{N})}{\partial q_n} $$ $$\tag{4}~=~ \frac{\partial L(q_n,v_n,t_n)}{\partial q} -\frac{1}{\Delta t}\left[\frac{\partial L(q_n,v_n,t_n)}{\partial v}-\frac{\partial L(q_{n-1},v_{n-1},t_{n-1})}{\partial v} \right], $$
where $n\in\{1,2, \ldots,N-2, N-1\}$. Eq. (4) is a discretized version of the Euler-Lagrange formula
$$\tag{5} \frac{\delta S[q]}{\delta q (t)}~=~\frac{\partial L(q(t),\dot{q}(t),t)}{\partial q(t)} - \frac{d}{dt} \frac{\partial L(q(t),\dot{q}(t),t)}{\partial \dot{q}(t)} $$
for the functional derivative$^1$. The functional derivative (5) satisfies by definition the following infinitesimal variation formula
$$\tag{6} \delta S[q] ~:=~S[q+\delta q]- S[q]~=~ \int_{0}^{T} \!\! dt~\frac{\delta S[q]}{\delta q (t)}\delta q(t). $$
--
$^1$ The existence of functional derivative depends on the appropriately imposed boundary conditions, which we have not discussed. In plain English: We need to integrate by part at some point.
The principle of Least (Stationary) Action (aka Hamilton's Principle) is derived from Newton's axioms plus D'Alembert's principle of virtual displacements.
Because D'Alembert's principle allows to account for the (reactions of the) bonds between the components of a system in a transparent way, the Lagrangian and Hamiltonian formulations are possible.
Note1: Newton's axioms, as given, cannot derive neither the Lagrangian form nor the Hamiltonian as they would need the reactions of the bonds to be added literally inside the formalism, thus resulting in different dimensionality and equations for the same problem where the (reactions of the) constraints would appear as extra unknowns.
Note2: D'Alembert's principle is more general than the Lagrangian or Hamiltonian formalisms, as it can account also for non-holonomic bonds (in a slight generalisation).
UPDATE1:
When the forces are conservative, meaning derived from a potential $V(q_i)$ i.e $Q_i = -\frac{\partial V}{\partial q_i}$, and the potential is not depending on velocities $\dot{q_j}$ i.e $\partial V / \partial \dot{q_j} = 0$ (or the potential $V(q_i, \dot{q_i})$ can depend on velocities in a specific way i.e $Q_i = \frac{d}{dt} \left( \frac{\partial V}{\partial \dot{q_i}} \right) - \frac{\partial V}{\partial q_i}$, refered to as generalised potetial, like in the case of Electromagnetism), then the equations of motion become:
$$\frac{d}{dt} \left( \frac{\partial T}{\partial \dot{q_i}} \right) - \frac{\partial T}{\partial q_i}-Q_i = \frac{d}{dt} \left( \frac{\partial L}{\partial \dot{q_i}} \right) - \frac{\partial L}{\partial q_i}$$
where $L=T-V$ is the Lagrangian.
(ref: Theoretical Mechanics, Vol II, J. Hatzidimitriou, in Greek)
UPDATE2:
One can infact formulate D'Alembert's principle as an "action principle" but this "action" is in general very different from the known Hamiltonian/Lagrangian action.
Variational principles of classical mechanics
Variational Principles Cheat Sheet
THE GENERALIZED D' ALEMBERT-LAGRANGE EQUATION
1.2 Prehistory of the Lagrangian Approach
GENERALIZED LAGRANGE–D’ALEMBERT PRINCIPLE
For a further generalisation of d'Alembert-Lagrange-Gauss principle to non-linear (non-ideal) constraints see the work of Udwadia Firdaus (for example New General Principle of Mechanics and Its Application
to General Nonideal Nonholonomic Systems)
Best Answer
I) A Lagrangian variational principle for Euler's equations for a rigid body
$$ \tag{1} (DL)_i ~=~M_i, \qquad i\in\{1,2,3\}, $$
is e.g. explained in Ref. 1. Here the angular momentum $L_i$, $i\in\{1,2,3\}$, along the three principal axes of inertia is tied to the angular velocity $\omega_i$, $i\in\{1,2,3\}$, by the formula
$$\tag{2} L_i~:=~I_i \omega_i, \qquad i\in\{1,2,3\}, \qquad (\text{no sum over }i).$$
The covariant time-derivative $D$ of a vector $\eta_i$, $i\in\{1,2,3\}$, is defined as
$$\tag{3} (D\eta)_i ~:=~ \dot{\eta}_i+(\omega\times\eta)_i, \qquad i\in\{1,2,3\}. $$
The angular velocity vector $\omega$ plays the role of a non-Abelian gauge connection/potential.
II) To see the $so(3)$ Lie algebra, we map an infinitesimal rotation vector $\alpha$ into an antisymmetric real $3\times3$ matrix $r(\alpha)\in so(3)$ as
$$\tag{4} \alpha_i \quad\longrightarrow\quad r(\alpha)_{jk}~:=~\sum_{i=1}^3\alpha_i \varepsilon_{ijk}. $$
The $so(3)$ Lie-bracket is given by (minus) the vector cross product
$$\tag{5} [r(\alpha),r(\beta)]~=~r(\beta\times \alpha). $$
Similarly, for the corresponding $SO(3)$ Lie group, a finite rotation vector $\alpha$ maps into an orthogonal $3\times3$ rotation matrix $R(\alpha)\in SO(3)$ as explained in this Phys.SE post. Infinitesimally, for an infinitesimal rotation $|\delta\alpha| \ll 1$, the correspondence is
$$\tag{6} R(\delta\alpha)_{jk} ~=~\delta_{jk} + r(\delta\alpha)_{jk} + {\cal O}(\delta\alpha^2). $$
III) A finite non-Abelian gauge transformation $\omega\longrightarrow\omega^{\alpha}$ takes the form
$$\tag{7} r(\omega^{\alpha})~=~R(-\alpha) \left(\frac{d}{dt}-r(\omega)\right)R(\alpha), \qquad \alpha\in \mathbb{R}^3.$$
An infinitesimal non-Abelian gauge transformation $\delta$ takes the form
$$\tag{8} r(\delta\omega)~=~\frac{d}{dt}r(\delta\alpha)-[r(\omega),r(\delta\alpha)],$$
or equivalently
$$\tag{9} \delta\omega_i~=~(D\delta\alpha)_i, \qquad i\in\{1,2,3\}, $$
where $\delta\alpha$ denotes an infinitesimal rotation vector corresponding to an $so(3)$ Lie algebra element $r(\delta\alpha)$.
We call (7)-(9) gauge transformations for semantic reasons, because of their familiar form, but note that (most of) them are not unphysical/spurious transformations. We stress that the angular velocity $\omega$ is a physical variable.
IV) Finally we are ready to discuss the action principle. The finite rotation vector $\alpha(t)\in \mathbb{R}^3$ plays the role of independent dynamical variables for the action principle. One may think of virtual rotation paths $\alpha:[t_i,t_f]\to \mathbb{R}^3$ as a reparametrization of virtual angular velocity paths $\omega:[t_i,t_f]\to \mathbb{R}^3$. The action reads
$$\tag{10} S[\alpha,\omega]~=~\int_{t_i}^{t_f} \! dt ~L $$
with Lagrangian
$$\tag{11} L~=~\frac{1}{2} L^{\alpha}\cdot \omega^{\alpha} + M\cdot \alpha , $$
where
$$\tag{12} L^{\alpha}_i~:=~I_i \omega^{\alpha}_i, \qquad i\in\{1,2,3\}, \qquad (\text{no sum over }i).$$
The Lagrangian (11) consists of rotational kinetic energy plus a source term from the torque $M$. Here $\omega^{\alpha}$ is the actual angular velocity vector, while $\omega$ here (in contrast to above) is a fixed non-dynamical reference vector, which is not varied. It is sort of a gauge-fixing choice. Infinitesimal variation yields
$$ \tag{13}\delta L ~\stackrel{(11)}{=}~ L^{\alpha}\cdot \delta\omega^{\alpha} + M\cdot \delta\alpha ~\stackrel{(9)}{=}~ L^{\alpha}\cdot \left(\frac{d}{dt}\delta\alpha + (\omega^{\alpha}\times\delta\alpha)\right) + M\cdot \delta\alpha ,$$
which (after integration by parts and appropriate boundary conditions) leads to Euler's equations (1) for the angular velocity vector $\omega^{\alpha}$.
References: