Power Law Fitting – Approaches to Data with Uncertainties in Statistical Analysis

fittingpower lawuncertainty

I need to fit data points on a power law and each one of these carries an uncertainty.

I've been using Python, more precisely scipy.optimize.curve_fit to get the job done but I don't know how to handle the uncertainties with it.

I thought about using a linear fit in log-log scale but it seems less precise compared to fitting directly on a power-law (not taking care of the uncertainties, just testing both on generated data and added noise).

I found that but it's not really helping me.

I wouldn't mind using R to do the job if needed.

Best Answer

To clarify the answer by user777, there are two packages that I and my collaborators have developed specifically for rigorously fitting power-law distributions. Both can be found here. One is for when your data are integer or real values, while the other (which is the one that user777 linked to) is for binned data, i.e., when you only know the number of the measurements within each of several contiguous ranges.

In each case, the packages have four parts:

  1. fit the power-law model to your data,
  2. estimate the uncertainty in your parameter estimates,
  3. estimate the p-value for your fitted power law, and
  4. compare your power-law model to alternative heavy-tail models.

These methods are described exhaustively in two references, both of which are freely available on the arxiv pre-print server (just search for their titles). The approach they use to fit the model is maximum likelihood, which is far more accurate than classic "curve fitting" approaches on scatter plots.

[integer and continuous quantities] A Clauset, C R Shalizi, and MEJ Newman. "Power-law distributions in empirical data." SIAM Review 51, 661-703 (2009).

[binned quantities] Y Virkar and A Clauset. "Power-law distributions in binned empirical data." Annals of Applied Statistics 8, 89-119 (2014).

In your case, I'm not entirely sure what you mean by each data point carrying an uncertainty. Do you mean a classic measurement uncertainty, like the kind you have when you measure the length or weight of an object (and is thus normally distributed)? If the variance is modest and the number of data points large, then you can get a pretty good estimate of the power-law parameter using our methods even if you ignore the uncertainty (because the estimator takes the logarithm of each value, and normally distributed fluctuations become highly compressed under the log). If you don't have much data, or if the uncertainty is really large, then I would recommend choosing a reasonable binning scheme (powers of 2 or something) and applying the binned-data methods.