You really should split your question. I will answer the part where you do not understand how counting of degrees of freedom work.
Basically we count the number of propagating (physical) degrees of freedom per point of spacetime. Of course, the total number of degrees of freedom is infinite because spacetime is continuous and has an infinite number of points, but to ask for the number of degrees of freedom per spacetime point is a reasonable demand to make. Bear in mind that we only care about physical degrees of freedom by which we mean those that can be properly normalized.
You correctly state that photons can be off-shell but they are only those involved in internal processes. External photons are always on-shell. Moreover, gauge invariance is a physical property. External fields which you measure in your laboratory should be independent of your chosen gauge. In other words, the S-matrix should be gauge-invariant. On the other hand, there is nothing that stops me from having gauge-broken internal processes if ultimately I can make the S-matrix gauge-invariant. Therefore, the word "physical" should almost always give you a picture of external on-shell gauge-invariant quantities.
So yes, gauge redundancy kills one degree of freedom, and when we are talking about propagating physical degrees of freedom, one more is killed on-shell. You have to understand how that happens. It is not that every time you see an equation of motion, a degree of freedom is killed. Killing of degrees of freedom requires an elaborate process of imposing constraints on the equation of motion known as gauge-fixing. And this has to be done on a case by case basis.
For example, consider the four equations of motion (separated into temporal and spatial sets) for the massless photon $A^\mu = (\phi, \vec A)$ describing four on-shell degrees of freedom as follows.
\begin{align*}
-\Delta \phi + \partial_t \vec\nabla\cdot\vec A = 0\,,\\
\square \vec A - \vec\nabla(\partial_t\phi-\vec\nabla\cdot\vec A) = 0\,.\\
\end{align*}
Since these equations exhibit a gauge symmetry $A_\mu \to A'_\mu := A_\mu + \partial_\mu \alpha_1(x)$, we can try to fix the gauge by choosing $\alpha_1$ such that, for instance, it is a solution of $\square \alpha_1 = -\vec\nabla\cdot\vec A$, giving us
\begin{align*}
\Delta \phi' = 0\,, \\
\square \vec A' - \vec\nabla\partial_t\phi' = 0\,. \\
\\
\vec\nabla\cdot\vec{A}'=0\,.\\
\end{align*}
We have selected a divergence-free field, the so-called Coulomb gauge. Under this choice, the electric potential becomes non-propagating, that is there are no kinetic terms in the Lagrangian for it (observe that $\Delta \phi' = 0$ does not have any time derivatives).
In momentum space, this gauge condition reads $\vec p \cdot \vec \epsilon = 0$ where $ \vec \epsilon$ is the polarisation vector (Fourier transform of the magnetic potential). There are three solutions to this constraint. Choosing a frame in which $p^\mu = (E,0,0,E)$, we find that the three polarisation vectors are
$$ \epsilon^\mu_1 = (0,1,0,0), \qquad \epsilon_2^\mu=(0,0,1,0), \qquad \epsilon_t^\mu = (1,0,0,0) $$
The third polarisation is time-like and therefore cannot be normalized. It is unphysical, and we have to get rid of it. Luckily, the gauge symmetry is not exhausted. There are more available choices of gauge transformations which preserve the Coulomb gauge $\vec p \cdot \vec \epsilon = 0$. For example, we could go from $A'_\mu \to A_\mu:= A'_\mu + \partial_\mu \alpha_2(x)$ such that $\Delta \alpha_2 = 0,\ \partial_t \alpha_2 = - \phi'$ which preserves the divergence and sets $\phi = 0$.
Note that this time we have to make sure that this gauge transformation happens on-shell, namely that $\Delta \phi = 0$, otherwise this gauge-fixing will be inconsistent because $\Delta \alpha_2 = 0 \Rightarrow$ $0 = \Delta \partial_t\alpha_2 = - \Delta\phi' \ne 0$ off-shell. In other words, requiring $\phi = 0$, or equivalently $\epsilon^0 = 0$, in order to get rid of unphysical degrees of freedom requires us to be on-shell.
To summarize, we made an off-shell gauge choice $\vec p \cdot \vec \epsilon = 0$, an on-shell gauge choice $\epsilon^0 = 0$ and our equation of motion became $p^2 = 0$. Having exhausted our gauge choices, we find only two physical polarization modes or degrees of freedom.
Now, you understand that merely having an equation of motion does not eat up a degree of freedom. To find the correct number of degrees of freedom, keep on making gauge choices (producing independent constraint equations), some off-shell and some on-shell, until you exhaust your gauge freedom. Then check how many degrees of freedom you are left with. If you notice any unphysical guy showing up, most likely you haven't used up all your gauge freedom and you still have enough flex to shoot this guy dead. Then, count all that you are left with. That's your answer.
Best Answer
Degrees of freedom can be defined as the number of independent ways in which the space configuration of a mechanical system may change.
Suppose I place an ant on a table with the restriction that the ant can move only through a tube on a line along x-axis. Then the ant will have only one degree of freedom in three dimensional space.
However if I allow the ant to move freely on the table, then it can be at any point on the surface at any time $t$ and it can change its $x$ as well as $y$ coordinate as time evolves. Then one can say the ant has two degrees of freedom. Thus the number of independent coordinates does define the configuration, and the degree of freedom is the count of that number.
However if I place an ant which has wings, then it can travel independently in $x$, $y$ and $z$ direction, and its position can be located at a point $P(x,y,z,t)$ at any instant, so now it has three degrees of freedom as it can be located by three independent variables.
To sum up:
A material particle confined to a line in space can be displaced only along the line, and therefore has one degree of freedom.
A particle confined to a surface can be displaced in two perpendicular directions and accordingly has two degrees of freedom.
A free particle in physical space has three degrees of freedom corresponding to three possible perpendicular displacements.
Now suppose I have two ants with wings then this system has three coordinate each and can be located by six independent variables. In this case, the degree of freedom = no. of particles x3 = 3N degree of freedom.
A system composed of two free particles has six degrees of freedom, and one composed of $N$ free particles has $3N$ degrees.
If a system of two particles is subject to a requirement that the particles remain a constant distance apart, the number of degrees of freedom becomes five.
Imagine those two ants bound by a string such that their distance apart is constant $L$ . An equation of the type $F(x,y,z,x',y',z',t) =L$ governing this condition will hold. This equation is called equation of constraint and each constraint can reduce the degrees of freedom by one. Therefore the constraint system of $N$ particles will have a no. of degrees of freedom = $3N -m$, where $m is the number of constraining equations operating on the system.
Any requirement which diminishes by one the degrees of freedom of a system is called a holonomic constraint.
Each such constraint is expressible by an equation of condition which relates the system's coordinates to a constant, and may also involve the time.
When applied to systems of particles, a holonomic constraint frequently has the geometrical significance of confining a particle to a specified surface, which may be time-dependent.
Constraints are defined as restrictions on the natural degrees of freedom of a system.
If $n$ and $k$ are the numbers of the natural and actual degrees of freedom, the difference $n − k$ is the number of constraints.
A pendulum ball is a body having 3 degrees of freedom. When one hangs the bob by a string of length $l$, then it can only move in a plane with the condition that $x^2 + y^2 = l^2$ . This is one constraining equation, and another one is that z = constant, or $z-c =0$. Now the bob has only one degree of freedom and can be defined with only one coordinate, say $\theta$, the angle made by the string with the vertical.
So the simple pendulum has only one degree of freedom. The advantage of the above description is that one can go to an independent set of coordinates to describe the motion of the system. Such sets are called generalized coordinates.
The new description in terms of 'generalised coordinates' and momenta leads to further development in Lagrangian mechanics by using the 'principle of virtual work' and developing a set of equations of motion in generalised coordinates and velocities, which are an independent set and free from the constraining forces, in terms of the evolution of kinetic and potential energies only.
One can check: