Power Law Fitting – Approaches to Data with Uncertainties in Statistical Analysis

fittingpower lawuncertainty

I need to fit data points on a power law and each one of these carries an uncertainty.

I've been using Python, more precisely scipy.optimize.curve_fit to get the job done but I don't know how to handle the uncertainties with it.

I thought about using a linear fit in log-log scale but it seems less precise compared to fitting directly on a power-law (not taking care of the uncertainties, just testing both on generated data and added noise).

I found that but it's not really helping me.

I wouldn't mind using R to do the job if needed.

Best Answer

To clarify the answer by user777, there are two packages that I and my collaborators have developed specifically for rigorously fitting power-law distributions. Both can be found here. One is for when your data are integer or real values, while the other (which is the one that user777 linked to) is for binned data, i.e., when you only know the number of the measurements within each of several contiguous ranges.

In each case, the packages have four parts:

fit the power-law model to your data,
estimate the uncertainty in your parameter estimates,
estimate the p-value for your fitted power law, and
compare your power-law model to alternative heavy-tail models.

These methods are described exhaustively in two references, both of which are freely available on the arxiv pre-print server (just search for their titles). The approach they use to fit the model is maximum likelihood, which is far more accurate than classic "curve fitting" approaches on scatter plots.

[integer and continuous quantities] A Clauset, C R Shalizi, and MEJ Newman. "Power-law distributions in empirical data." SIAM Review 51, 661-703 (2009).

[binned quantities] Y Virkar and A Clauset. "Power-law distributions in binned empirical data." Annals of Applied Statistics 8, 89-119 (2014).

In your case, I'm not entirely sure what you mean by each data point carrying an uncertainty. Do you mean a classic measurement uncertainty, like the kind you have when you measure the length or weight of an object (and is thus normally distributed)? If the variance is modest and the number of data points large, then you can get a pretty good estimate of the power-law parameter using our methods even if you ignore the uncertainty (because the estimator takes the logarithm of each value, and normally distributed fluctuations become highly compressed under the log). If you don't have much data, or if the uncertainty is really large, then I would recommend choosing a reasonable binning scheme (powers of 2 or something) and applying the binned-data methods.

Related Solutions

Solved – Why does the scaling exponent of a power law fit change so radically when the data is scaled by a constant

Here is what it looks like in R.

x <- c(4, 4.5, 5, 5.5, 6, 6.5)
y1 <- c(0.000159334114311, 0.000184477307337, 0.002931979623674, 0.004321711975947, 
0.006269020390557, 0.012537205790269)
y2 <- c(0.000160708687146, 0.000186102543697, 0.002956862489638, 0.004356837209873, 
0.006325918592142, 0.01266703594829)

> (out1 <- nls(y1 ~ a*x^b, start=list(a=1,b=10)))
Nonlinear regression model
  model:  y1 ~ a * x^b 
   data:  parent.frame() 
        a         b 
2.880e-08 6.926e+00 
 residual sum-of-squares: 2.446e-06

Number of iterations to convergence: 31 
Achieved convergence tolerance: 1.326e-07 

> (out2 <- nls(y1/86400 ~ a*x^b, start=list(a=1,b=10)))
Nonlinear regression model
  model:  y1/86400 ~ a * x^b 
   data:  parent.frame() 
        a         b 
3.333e-13 6.926e+00 
 residual sum-of-squares: 3.277e-16

Number of iterations to convergence: 11 
Achieved convergence tolerance: 2.176e-07 

> (out3 <- nls(y2 ~ a*x^b, start=list(a=1,b=10)))
Nonlinear regression model
  model:  y2 ~ a * x^b 
   data:  parent.frame() 
        a         b 
2.849e-08 6.938e+00 
 residual sum-of-squares: 2.491e-06

Number of iterations to convergence: 30 
Achieved convergence tolerance: 2.456e-07 

> (out4 <- nls(y2/86400 ~ a*x^b, start=list(a=1,b=10)))
Nonlinear regression model
  model:  y2/86400 ~ a * x^b 
   data:  parent.frame() 
        a         b 
3.297e-13 6.938e+00 
 residual sum-of-squares: 3.337e-16

Number of iterations to convergence: 11 
Achieved convergence tolerance: 4.397e-07

plots:

plot(x, y1, "l")
lines(x, y2, "l", col="green")
lines(x, out1$m$fitted(), col="red")
lines(x, out3$m$fitted(), col="red")

enter image description here

Looks reasonable. I don't think nls produces confidence intervals by default. For a readable account of how it works, see Modern Applied Statistics with S, Section 8.2.

None of this answers your question, but I hope it helps.

Solved – Fit power law for distributions with zeroes

I guess your "this is a very newbie question" refers to this of your many questions:

"...but conceptually, would the point of a power law be violated if there 
are some zeroes in the data?"**

No. The concept remains valid as the same class of distributions may be applied to data with or without zeros. You may be interested in reading more about Tweedie class of distributions here and then here.

For example, the well-known Taylor’s law says that the variance is proportional to a power of the mean. Taylor’s law is mathematically identical to the variance-to-mean power law that characterizes the Tweedie distributions, that is for any random variable that obeys a Tweedie distribution, the variance relates to the mean by the power law. Since that "any random variable" can be discrete, continuous or a combination of both, the concept of the power law may equally apply to data that are counts (Poisson), reals (Normal), positive reals (Gamma), or positive reals with the added positive mass at zero (compound Poisson–gamma).

Given your "there are some zeros in the data" and your comment "yes, my values are counts", simple Poisson may work. If not, e.g. zeros are too few or too many, you may try Neyman Type A distribution (this R package manual mentions it the context of the Tweedie class of distributions).

I hope some of the above helps.

Best Answer

Related Solutions

Solved – Why does the scaling exponent of a power law fit change so radically when the data is scaled by a constant

Solved – Fit power law for distributions with zeroes

Related Question