Particle Physics – Utilizing Monte-Carlo Simulation in High-Energy Physics for Data Analysis

data analysisexperimental-physicsparticle-physicssimulationsstatistics

I've been doing some research into the analysis used in particle physics when determining the significance of a finding (e.g. the recent Higgs candidate was announced as a boson in the 125-126 GeV/$c^{2}$ mass range with a $5\sigma$ significance).

I believe this confidence level is determined by estimating the Standard Model background cross-section they should observe if all known processes except for Higgs production occurred and then taking the ratio of the observed cross-section with that predicted.

I am interested in how they determine the background cross-section. I believe that they use a Monte-Carlo simulation normalized to fit with well-known processes such as $Z^{0}Z^{0}$-production, but how exactly does this work?

I am aware that the tool primarily used in High-energy physics for this kind of modelling is Geant, and I would like to know more about how this works. I have looked through the source code, but it is very fragmented and thus quite hard to understand, especially since I am not 100% certain what it is that should be occuring in the code.

Best Answer

Geant is a framework---which means that you use it to build applications that simulate the detector and physics you are interested in. The simulation can include all of physics and the complete detector including electronics and trigger (i.e. you can write your simulation so that it output a data file that looks just like the one you are going to get from the experiment1).2

The various parts of Geant are validated by being able to correctly predict the outcomes of experiments. Particular models are tuned on well known physics early in the analysis of the data. This allows you to get simulated optical properties, detector gains and so on correctly matched to the actual instrument.

Geant is also heavily documented. Read the introduction and the first two chapters of the User's guide for Application Developers, which will give you the basics. After that you can delve into the hairy details in the Physics and Software references. There is much, much too much to cover in a Stack Exchange answer. (I mean literally....if I tried I'd end up overrunning the 32k characters per post limit.)

It helps to know that Geant4 derives from Geant3 and earlier efforts. This thing has a history that goes back for decades and has been tested in thousands of experiments large and small.


The use in the Higgs search goes something like this

  • We have a theory--the Standard Model--which tells us what coupling to expect for the particle we hope to detect
  • We write (and test) a Geant physics module implementing those physics. Maybe more than one. We may need to write a new event generator or tweak an existing one in parallel to this effort.
  • You construct a geant simulation of your detector. You include a simulation of the electronics, trigger and so on.3
  • You simulate a lot of data from the desired channel and from possible interfering channels (including detector noise and backgrounds). You're going to use a cluster or a grid for this, because it is a big problem
  • You combine this simulated data.
  • You run your analysis on the simulated data.4
  • You extract from these results an "expected" signal.

Actually, you did all of the above at lower precision several times during the design and funding phase and used those result to determine how much data you would have to collect, what kinds of instrumentation densities you needed, what data rate you had to be able to support and so on ad nauseum.

Once you have got the data, you start by showing that:

  1. You can detect lots of well known physics in your detector (to validate the detector and find unexpected problems)5

  2. That your model correctly represents the detector response to that well known physics (to let you debug and tune your model)

Then you may need to re-run some of the "expected" processing.

Only then can you try to compare data to expectation.6


1 Indeed the data format is often thrashed out and debugged from the MC before the experiment is even built.

2 For big, complicated experiments like those at the LHC Geant is usually paired with one or more external event generators. In the neutrino experiments I'm currently working on that means Genie and Cry. Not sure what the collider guys are using right now.

3 For speed reasons we often simulate the electronics and trigger outside of Geant proper, but this decision is made on a case by case basis.

4 Indeed the analyzer is often programmed and debugged from the MC output before there is real data.

5 This is also where most of the actual repetition of results in the particle physics world comes from. You won't get funding to repeat BigExper's measurement of the WingDing Sum Rule, but if your proposed NextGen spectrometer can do that as well as your spiffy New Physics (tm) it helps your case with the funding agencies.

6 Many of these steps will be done by more than one person/group in the collaboration to provide copious cross-checks and protection against embarrassing mistakes. (See also, OPERA's little issue last year...)

Related Question