Electroweak theory told us where to look for the $W$ and $Z$ gauge bosons. For the Higgs, its mass is a free parameter, hence we didn't know where to look for it. Once you start to look in many places for a particle, you also have to factor in the look-elsewhere effect, which basically means that the more places you look for a particle, the higher the chances you only observe a statistical fluctuation.

**LEP**

Today we know the Higgs mass is roughly $125$ GeV. The LEP collider at CERN, reached a center of mass energy $\sqrt{s}$ of $209$ GeV. This would have been enough to produce the Higgs. But then, $e^+ e^-$ colliders do not have the highest Higgs production cross section. Your statement:

The Higgs also couples to just about everything, so it's not hard to make.

is not correct. The Higgs couples only very lightly to light particles. So you need to produce heavy particles first in order to have a good chance to also produce a Higgs.

At LEP, the Higgs must have been produced by vector boson fusion, just not enough for us to clearly distinguish it from background:

Associated production is also thinkable, but the cross section is very small, since $m_H + m_Z > \sqrt{s}_{LEP}$:

Even smaller would be $ttH$ production:

Long story short, at LEP, the center of mass energy was simply not high enough, since it's not enough to produce the Higgs only, but you have to produce heavy particles first, that then couple to the Higgs.

**Tevatron**

What about the Tevatron? Surely the Higgs must have been produced copiously at the Tevatron. Why didn't they see it? The answer is that they *did* see it. Both CDF and $D^0$ reported an excess. In the combined publication, they even reported an "evidence for the presence of a new particle consistent with the standard model Higgs boson". The combination observed a global significance of $3.1 \sigma$.

But the question remains, why "only" a $3.1 \sigma$ excess, although years of data taking? The answer is two-fold: First, the instantaneous luminosity of the Tevatron was relatively small, at least compared to the LHC. The problem is that while using a proton anti-proton collider helps you in raising the cross sections, the production of anti-protons is notoriously slow and you can't easily reach high instantaneous luminosities.

The second reason, and also the general answer to why finding the Higgs was not so easy, lies in the way the Higgs decays:

This image shows the decay channels of the Higgs boson for a given mass. At $125$ GeV, it mostly decays to a $b\bar{b}$ pair. $b$ quarks hadronize and what you see in the detector are two "jets". A two jets event is not a particularly clean signature. In a hadron collider machine, your background is overwhelming. As far as I know, even today, the observation of the $H \to b\bar{b}$ signal has not reached the $3 \sigma$ level in any LHC experiment.

There are two ways around this: Either look for a Higgs produced with another particle or rely on Higgs decaying to other particles. In a Higgs associated production, the $b\bar{b}$ pair is accompanied by a $W/Z$ boson and the sensitivity is much higher, since the background is much smaller. This is what the experiments at the Tevatron observed.

Higgs decaying to $ZZ$ or to $\gamma\gamma$ are particularly clean channels. But the branching fraction $H \to ZZ$ is $2.67$ %, and $H \to \gamma\gamma$ is even only $0.228$ %, so you need a large dataset to see these decays. Such a dataset was not available in time at the Tevatron.

It is interesting to note, that if the Higgs would have been lighter, it would have been produced copiously at LEP, and we would have found it a long time ago. If it would have been heavier, the branching fraction to $b\bar{b}$ would have been considerably smaller, and we would have observed it at the Tevatron. The mass of $125$ GeV simply turned out to be a signal which was hard to distinguish from background, for various, aforementioned reasons. That's why it took so long to find the Higgs.

## Best Answer

Let me add an experimentalist's opinion. From the plots shown by Lubos above, one sees that two distributions are being subtracted in order to bring up the signal. A Monte Carlo background from expected interactions with a number like 500 events/8GeV, and a similar number for the experimental data. As the Monte Carlo background has no error bars, I presume the statistics are much higher and the histogram is just normalized to the number of events in the data.

The statistical error in a number of events for each data bin is about $\sqrt{500}$=22.4 events. If I measure the error in the difference plot you show above, it is not larger than this number for each bin. This means they have not included systematic errors in their error estimates. One such is the error produced by the shift in energy as discussed by Lubos. This should have been added to the errors by varying the Monte Carlo background according to the 1-sigma error of the energy of the jet and added to the error bars. There are other systematic errors one can think about in subtracting data from Monte Carlo events and in cut decisions. The effect of each cut should be in a systematic error. Each variable used in the cuts (including the 8 Gev binning above) should be varied in the Monte Carlo within the error bars of the variable and the error estimate.

Note that systematic errors are added linearly and not in quadrature.

If this bump is not a statistical fluctuation but a result of underestimation of systematic errors, even if CDF doubles the statistics the problem will remain. It is independent experiments that will inform us of whether it is a statistical fluctuation, an analysis artifact or a true signal.