I'm no expert on the historical development of the subject, however I will offer a derivation.
Consider two frames of reference $S$ and $S'$, and suppose that $S'$ moves with speed $\textbf v$ with respect to $S$. Coordinates in $S$ and $S'$ are related by a Galileian transformation:
$$\begin{cases} t' = t \\ \textbf x' = \textbf x-\textbf vt\end{cases}$$
To find how the fields transform, we note that a Lorentz transformation reduces to a Galileian transformation in the limit $c \to \infty$. In fact, under a Lorentz transformation the fields transform like:
$$ \begin{cases}
\textbf E' = \gamma (\textbf E + \textbf v \times \textbf B) - (\gamma-1) (\textbf E \cdot \hat{\textbf{v}}) \hat{\textbf{v}}\\
\textbf B' = \gamma \left(\textbf B - \frac{1}{c^2}\textbf v \times \textbf E \right) - (\gamma-1) (\textbf B \cdot \hat{\textbf{v}}) \hat{\textbf{v}}\\
\end{cases}$$
Taking the limit $c\to \infty$ so that $\gamma\to 1$, we obtain the Galileian transformations of the fields:
$$ \begin{cases}
\textbf E' = \textbf E + \textbf v \times \textbf B\\
\textbf B' = \textbf B\\
\end{cases}$$
We can then invert the transformation by sending $\textbf v \to -\textbf v$:
$$ \begin{cases}
\textbf E = \textbf E' - \textbf v \times \textbf B'\\
\textbf B = \textbf B'\\
\end{cases}$$
By the same reasoning, can obtain the Galileian transformation of the sources:
$$ \begin{cases}
\textbf J = \textbf J' + \rho' \textbf v\\
\rho = \rho'\\
\end{cases}$$
We know that the fields and sources satisfy Maxwell's equations in $S$:
$$ \begin{cases}
\nabla \cdot \textbf E = \rho/\epsilon_0\\
\nabla \cdot \textbf B = 0\\
\nabla \times \textbf E = -\frac{\partial \textbf B}{\partial t}\\
\nabla \times \textbf B = \mu_0 \left(\textbf J +\epsilon_0 \frac{\partial \textbf E}{\partial t} \right)\\
\end{cases}$$
Replacing the fields and sources in $S$ with those in $S'$ we obtain:
$$ \begin{cases}
\nabla \cdot \textbf (\textbf E' - \textbf v \times \textbf B') = \rho'/\epsilon_0\\
\nabla \cdot \textbf B' = 0\\
\nabla \times \textbf (\textbf E' - \textbf v \times \textbf B') = -\frac{\partial \textbf B'}{\partial t}\\
\nabla \times \textbf B' = \mu_0 \left(\textbf J' + \rho' \textbf v +\epsilon_0 \frac{\partial (\textbf E' - \textbf v \times \textbf B')}{\partial t} \right)\\
\end{cases}$$
As a last step, we need to replace derivatives in $S$ with derivatives in $S'$. We have:
$$\begin{cases} \nabla = \nabla' \\ \frac{\partial }{\partial t} = \frac{\partial }{\partial t'} - \textbf v \cdot \nabla\end{cases}$$
Substituting and removing the primes and using vector calculus, we obtain:
$$ \begin{cases}
\nabla \cdot \textbf E + \textbf v \cdot (\nabla \times \textbf B) = \rho/\epsilon_0\\
\nabla \cdot \textbf B = 0\\
\nabla \times \textbf E = -\frac{\partial \textbf B}{\partial t}\\
\nabla \times \textbf B = \mu_0 \left(\textbf J + \rho \textbf v +\epsilon_0 \frac{\partial}{\partial t}( \textbf E - \textbf v \times \textbf B) - \epsilon_0 \textbf v \cdot \nabla (\textbf E - \textbf v \times \textbf B) \right)\\
\end{cases}$$
In a vacuum, we can take the curl of the fourth equation to obtain:
$$c^2\nabla^2 \textbf B = \frac{\partial^2 \textbf B}{\partial t^2} + (\textbf v \cdot \nabla)^2 \textbf B - 2 \textbf v \cdot \nabla \left(\frac{\partial \textbf B}{\partial t}\right)$$
Substituting a wave solution of the form $\textbf B \sim \exp{i(\textbf k \cdot \textbf x -\omega t)}$
We obtain an equation for $\omega$, which we can solve to obtain:
$$\omega = -\textbf v \cdot \textbf k \pm c |\textbf k|$$
Therefore the speed of propagation is the group velocity:
$$\frac{\partial \omega}{\partial \textbf k} = -\textbf v \pm c \hat{\textbf{ k}}$$
which gives you the expected $c\pm v$ with an appropriate choice of $\textbf v$ and $\textbf k$.
For a basic treatment of the Michelson-Morley experiment please see 1. It's not important to know the technical details of the experiment to answer your questions though. The only relevant thing is the result, let me put it in basic terms since you seem to struggle with the "physics slang":
While the total velocity of a ball thrown from a truck is the sum of the velocity of the ball relative to the truck and the velocity of the truck relative to the observer, the velocity of a light beam emitted from the truck is not. Much more the velocity of the light beam seems completely independent of the velocity of the truck.
Michelson and Morely didn't have a truck, they had the earth orbiting the sun.
Please make it clear to yourself that this experimental fact can be explained by stating that the speed of light is constant. If I say to you the speed of light is constant in every frame of reference, then the above result isn't surprising at all to you.
But you want more. You want me to prove to you that the speed of light is universally constant. I cannot. There will never be an experiment that shows that this axiom is universally true. How should one ever construct such an experiment, how should one, for example, test the theory in the Andromeda galaxy? It's impossible, but it doesn't matter: Why not just stick with the axiom, as long as we can explain everything we see around us with it?
As you already said there's an interesting connection between the
invariance of the speed of light and Maxwell's equations. One can indeed prove that the speed of light has to be constant, otherwise, Maxwell's theory can't be true for all inertial frames. But this is no proof that can convince you either, since accepting Maxwells equations is no different to accepting the invariance of the speed of light. Furthermore, the basis of Einstein's theory is not the invariance of the speed of light, but the invariance of the speed of action. Which cannot be concluded from Maxwell's theory, even though it's a reasonable guess.
Physical theories are not provable. But as long as they comply with reality, we accept them as truths.
Addendum: I recommend this short lecture for layman by R. Feynman on the topic. Feynman and I present a very similar line of reasoning.
Best Answer
Lefaroundabout's comment is important. While we are typically taught that we use science to know things, that is not actually a correct statement. Science is a very powerful tool for creating models that can be used to create educated predictions about how a system will behave, and it is founded on the idea of falsifiable hypotheses, but that doesn't mean we're never wrong. It just means it's possible to disprove our hypotheses.
Your example of making the velocities add is a great example. It's terribly intuitive that velocities add together. If I'm on a train, and I throw the baseball, an observe on the ground sees the baseball hurtling through the air at the train's speed plus the speed of my throw. It would be very natural to assume that light behaves the same way. In fact, I think most people believe this is how light works until they are told otherwise by a science teacher.
Now let's bring in Maxwell's equations. Maxwell's equations do a remarkably good job of predicting how electricity and magnetism behave. You can try to falsify them by building oddly designed experiments to isolate magnetic monopoles and so forth, but we found his laws simply hold up well (at least all the way up to Quantum Mechanics, which is its own beast, and its own story). After a lot of testing, the scientific community came to a consensus that Maxwell's equations are pretty darn reliable. I can't say "they knew his equations were true," because that would be an overstatement, but their confidence was very high.
However, there's a quirk. Maxwell's equations predict a "speed of light." But if you go back to our baseball example, we see that the baseball is going at different speeds in different inertial frames. While I ride on the train at a constant velocity, I am viewing the world from an inertial frame, and I see the ball at one speed. While you are on the ground, standing still, you are viewing the world from an inertial frame, and you see the ball at a different speed. Maxwell's equations simply don't have any room for that. They just say "light has a fixed speed," leaving scientists to ponder what's up with that.
One intuitive approach is to assume the light is traveling through a medium, and the speed of light is with respect to that medium. This is intuitive when you look at effects like drag on a baseball. The drag forces on a baseball aren't dependent on how fast it's traveling with respect to me or you, it's how fast it's traveling with respect to the wind. It was theorized that light might travel in a so called the "luminiferous aether," just like our baseball travels through the air. This solves the conundrum of Maxwell's equations: the "speed of light" is the speed of light with respect to the aether.
So this was a reasonable hypothesis. Just like your "velocities add" hypothesis, it lead to natural ways of thinking about light. Of course, this being a scientific hypothesis, it was designed to be falsifiable. If one could demonstrate that light's movement did not act like there was some privileged reference frame (the frame of the aether), then one would be able to refute this hypothesis. And they did.
The most famous experiment falsifying the aether theories was the Michaelson–Morley experiment. Through clever use of interferometry, they were able to compare the speeds of light going in the direction of the Earth's orbit around the sun versus going across it. Their goal was to determine if the aether was stationary, or if it was somehow "dragged" along by massive objects like the Earth (like how air forms drafts behind a large vehicle). They found, curiously enough, that there was no detectable difference in the speed of light in the two directions. If indeed the aether existed (which they believed at the time), it was so tied to the movement of the earth that we couldn't discern it. It's like you were drafting behind a large vehicle, and instead of feeling the wind pull you forward, it felt more like you were encased in concrete and being dragged forcefully along!
Many other experiments also found results like this, which made aether theories start to seem very unreliable. They just called for too much "hand waving." From this, we developed the Lorentz boosts, which were modifications to Maxwell's equations which were very effective at predicting the results of experiments like these, but made the equations terribly ugly. The beauty of Maxwell's equations vanished under the Lorenz transformations.
So now enter Einstein, making his assumption that the speed of light must be the same in all reference frames. I agree with your original opinion that it's a strange thing to just assume. But it was brilliant. When he was done with the math, the ugly Lorenz boosts that defiled Maxwell's equations were neatly tucked away into this assumption that the speed of light was the same in all reference frames. It did a very good job of cleaning up a lot of ugliness in the theories. People liked it.
More than being liked, it was scientific: it was falsifiable. If we ever found two inertial frames which had different speeds of light, or if we found out that time dilation did not occur, it would have falsified Einstein's theories, and we probably wouldn't revere him as we do today. However, in hundreds (if not thousands) of experiments, we have found that Einstein's theory is extraordinarily good at predicting some really awkward and unintuitive effects.
So thus, we justify his assumption that the speed of light is the same in all inertial frames after the fact. We have found that the results of this assumption are tremendously useful and effective. At the time, the justification was that it was an elegant solution to a very difficult problem, and it produced new falsifiable hypotheses to test (like any good scientific theory does).