$\DeclareMathOperator{\gal}{Gal}$
Here's a comment which one can make to differential geometers which at least explains what etale cohomology "does". Given an algebraic variety over the reals, say a smooth one, its complex points are a complex manifold but with a little extra structure: the complex points admit an automorphism coming from complex conjugation. Hence the singular cohomology groups inherit an induced automorphism, which is extra information that is sometimes worth carrying around. In short: the cohomology of an algebraic variety defined over the reals inherits an action of $\gal(\mathbb{C}/\mathbb{R})$.
The great thing about etale cohomology is that a number theorist can now do the same trick with algebraic varieties defined over $\mathbb{Q}$. The etale cohomology groups of this variety will have the same dimension as the singular cohomology groups (and are indeed isomorphic to them via a comparison theorem, once the coefficient ring is big enough) but the advantage is that that they inherit a structure of the amazingly rich and complicated group $\gal(\bar{\mathbb{Q}}/\mathbb{Q})$. I've often found that this comment sees off differential geometers, with the thought "well at least I sort-of know the point of it now". A differential geometer probably doesn't want to study $\gal(\bar{\mathbb{Q}}/\mathbb{Q})$ though.
If I put my Langlands-philosophy hat on though, I can see a huge motivation for etale cohomology: Langlands says that automorphic forms should give rise to representations of Galois groups, and etale cohomology is a very powerful machine for constructing representations of Galois groups, so that's why I might be interested in it even if I'm not an algebraic geometer.
Finally, I guess a much simpler motivating good reason for etale cohomology is that geometry is definitely facilitated when you have cohomology theories around. That much is clear. But if you're doing algebraic geometry over a field that isn't $\mathbb C$ or $\mathbb R$ then classical cohomology theories aren't going to cut it, and the Zariski topology is so awful that you can't use it alone to do geometry---you're going to need some help. Hence etale cohomology, which gives the right answers: e.g. a smooth projective curve over any field has a genus, and etale cohomology is a theory which assigns to it an $H^1$ of dimension $2g$ (<pedant> at least if you use $\ell$-adic cohomology for $\ell$ not zero in the field <\pedant>).
I can tell you how they are related.
Before Riemann people would say, for example, the complex square root function (for $z\neq 0$) is two valued, but for any small region of (non-zero) complex numbers you can make it single valued by picking one branch. Riemann had a vastly better idea: there is a two-sheeted covering surface for the complex plane (ramified at 0) with square root a single-valued function on that cover.
Serre, who was well aware of the connection to Riemann, found a theory of 1-dimensional cohomology that worked correctly for the Weil conjectures, using not sheaves but fiber bundles, where a fiber bundle is considered locally trivial (and called "isotrivial"), not when it restricts to product bundles on small enough parts, but if it can be made into a product bundle by pulling it back along such a cover.
Well, Serre also saw how he could state the algebraic conditions needed to make this work, not only over the complex numbers, but over any field. Those conditions are now taken as the definition of a finite etale map. Grothendieck, with Artin and others, including Serre, made it work in all dimensions and for that purpose preferred to drop the requirement that the map be finite.
As to this works for the Weil Conjectures, let add a bit on why Serre first thought his "unramified maps" (which later gave way to the slightly different etale maps) were the way to such a cohomology, and why Grothendieck then decided this was exactly the way. You should combine this with Peter Dalakov's concise modern statement of the facts in his comment, and Will Sawin's beautiful account of what a cohomology theory for those conjectures would have to be like.
No one who was interested in the Weil Conjectures when they first appeared believed fields in finite characteristic would support any close analogue to the analytic topology on complex numbers. In hindsight people today pretty much agree with that, but at the time most considered this a decisive obstacle to any cohomological proof of the Weil Conjectures. And no one before Serre's FAC saw how to use Zariski topology to prove any very serious results. Serre's FAC immediately persuaded a lot of people that algebraic geometry over arbitrary fields could, and in fact must, use the Zariski topology.
But many structures which intuitively ought to be "locally trivial" are clearly not so if "locally" means "on small enough Zariski open sets." Zariski open sets just never are small -- they are dense on any connected component. Serre wrestled with precisely this problem for several years. And then in 1958, with Riemann's original works explicitly in mind, Serre said let us allow "local trivialization" of fiber bundles just the way Riemann "trivialized" multiple valued functions into single valued ones-- let us trivialize them by pullback along unramified Riemann surface covers -- except using a purely algebraic definition of "unramified" so it works over any field, and indeed for varieties of any dimension. A strikingly plausible idea once you think of it. But does it work?
By the kind of deep, detailed skill that Serre typically conjoins to his insights, he got it to work for dimension one cohomology (of varieties of any dimension). It works in the precise sense that it delivers the $H^1$ part of the long exact cohomology sequences you would want for the Weil Conjectures.
Serre knew well how hard he had to work to get these $H^1$s. So he was skeptical when Grothendieck first announced this had to work for cohomology in all dimensions. But Grothendieck had utter faith in his general theory of derived functor cohomology: once Serre identified the correct basics, they had to deliver the whole theory.
Well it turned out to take a lot more specific work, and there is the long and on-going story of the standard conjectures which were meant to make the cohomological proof much simpler than it yet is, but Grothendieck's faith was essentially justified.
As to the history I would slightly modify what Will Sawin says. He puts the key issues very well. But Weil did not believe there could be an actual cohomology theory for varieties in finite characteristic. I believe he believed there would be some more direct comparison theorem between varieties in finite characteristic, and their lifts to characteristic zero, which would make the conjectures follow from simplicial cohomology. And he did not especially believe that such a comparison would be the way to prove the conjectures. He probably leaned to the idea that the relation to simplicial cohomology of complex manifolds would be an enlightening corollary to some other kind of proof.
Best Answer
It's inherently difficult to give a negative answer to a question like this, but here's a technical fact that pushes in that direction:
Let ZFC$_n$ be the subtheory of ZFC gotten by restricting Separation and Replacement to $\Sigma_n$ formulas. By the reflection principle,$^1$ for each $n$ the theory ZFC proves that there is an ordinal $\alpha_n$ such that $V_{\alpha_n}\models$ ZFC$_n$. That is: $$\mbox{For each $n\in\mathbb{N}$, ZFC proves Con(ZFC$_n$).}$$
We can think of the $V_{\alpha_n}$s as "approximate universes" which behave like universes for all "sufficiently simple" formulas, the point being that if you specify a complexity level ahead of time you can always assume you have an approximate universe appropriate to that complexity level.
Now the compactness theorem now naively suggests that - since we can only ever use finitely many sentences in a given proof - any argument with universes whatsoever can be replaced with one involving just approximate universes, and hence a proof in ZFC. This is of course false, but counterexamples have to be "global" as opposed to "local" - they need to at some point refer to the whole of the universe in question as a single completed object.
For exampe, the way ZFC + universes proves the consistency of ZFC is by showing that a universe $U$ is a model of ZFC. The statement "$U\models$ ZFC" is expressed in the language of set theory by talking about Skolem functions over $U$ (or something morally equivalent), and this takes place in the context of the powerset of $U$. But this sort of thing isn't to my knowledge how universes are applied in algebraic geometry - they instead use a universe to argue that a "sufficiently closed" object exists in that universe, and this "local" argument is exactly the sort of thing that the reflection principle tells us can generally be reduced to ZFC alone.
That said, there is an obvious place to look for such: arguments using two (or $n$) universes. The larger universe does see the smaller universe as a completed object, so the coarse heuristic above suggests that we can replace only the larger universe with an approximate universe - that is, that arguments which are quickly phrased in terms of two universes can be directly translated to arguments involving only one universe. Now we can't cheat anymore - we nee actual arguments about algebraic geometry. My understanding is that we're still in a situation where universes are an unnecessary convenience, but now I'm far outside of my area of competence. Still, the above should give an indication of why a real essential use of universes in a concrete result (which will certainly only involve reference to a small fragment of the cumulative hierarchy) would be very surprising.
$^1$OK fine, the reflection principle is usually phrased for finite subtheories of ZFC. But $(i)$ that's not really any different as far as the heuristic is concerned, just more annoying to work with; and $(ii)$ the stronger version of reflection I've stated is also true (the point being that for each $n$, the schemes of $\Sigma_n$-Separation and -Replacement can be expressed in the language of set theory by a single sentence, which in turn can be proved from finitely many of the ZFC axioms which we can bash with the usual reflection hammer).
And on that note, it's worth pointing out two facts about reflection which help flesh out the picture:
First, given that ZFC proves the compactness theorem, we seem to be in tension with Godel's incompleteness theorem. What saves us is that "$\forall n$" and "ZFC proves" don't commute (unless ZFC is inconsistent of course): while ZFC does prove each specific instance of reflection, it can't prove the full version (unless, again, it's inconsistent).
It's also worth noting that a similar result holds for (first-order) Peano arithmetic (as does the analogous version of the previous bulletpoint), although of course we need to talk about mere consistent Henkinized complete theories as opposed to canonical-ish models. As a cute consequence, Kripke used this fact to give a purely model-theoretic proof of Godel's incompleteness theorem (in the absence of reflection, his argument would require the soundness of PA, similarly to how Godel's original argument assumed $\omega$-consistency rather than mere consistency).