Solved – Extracting power of a power law from data

computational-statisticsdistributionsfittingmethodologypower law

My question is more about the methodology. Assuming in some experiment we have measured quantity $y$ per each unit of time $x.$ So $y$ and $x$ form our data set here. Moreover, we know that they are related by a
power law type of relation, e.g., $y = D x^{\alpha},$ where $D$ is just a constant.

Now to extract $\alpha$ from the data-set, I know two ways:

a) Calculating the logs of our data, we can then compute the derivative of the $\ln(y)$ w.r.t $\ln(x),$ so $\frac{d\ln y}{d\ln x}=\alpha$ and extract the power. One problem with this is that the sampling may not have been done logarithmically, so the spacings between the log'ed values are different. That means numerically it is going to be hard to accurately compute such derivative.
b) Another way would be: taking the logs again as in above, but then we just fit the log'ed data with a line, the slope of which should give us an average $\alpha,$ right? Assuming this is correct, one problem is that if $\alpha$ is changing during different time scales of the experiment, the above fit wouldn't capture it. Maybe one could perform the fits piece-wise.

Questions:

Have I laid out the above methods correctly? (e.g., is b) correct?)
Does one method come more recommended or it really depends on the context? (i.e., in view of the aforementioned difficulties) Finally, please feel free to suggest other ways of extracting $\alpha$ if you know of different methods, I'm very curious to find out.

(If you prefer explaining your method with an example, I have created dummy data for purposes of illustration, here's the link, first column is $x$ and second column $y.$)

Clarifications upon reading discussions in the comment section:

The aim here is only to tackle the problem of how to reasonably estimate (for instance by fitting) power laws that describe a given bivariate data set, and more precisely, finding power laws that correspond to each region of interest [*] (i.e., subsets of the data). With this in mind, what the user Nick Cox has proposed as answer, is precisely on point.

[*]: thus e.g., fitting subsets of the data, and more contextually, looking for different power laws at different time-scales, because for instance from a physical point of view we expect the data to exhibit different power laws.

Best Answer

As in my first comment on the question I see this as being entirely about power laws for bivariate data. (The inclination to read it otherwise is puzzling.)

Based on the posted data, I did local polynomial smoothing; the choices here are no more than not very smart defaults in the program used, but equally there doesn't seem much need to play with other choices. (The $R^2$ here is just the square of the correlation between observed and smoothed; 1 isn't even a target as interpolating the data could achieve that.)

It seems clear that the slope stabilises fairly quickly and systematically in logarithmic space, so that numerical differentiation could give you estimates of slope as it changes.

Related Solutions

Solved – Is this a reasonable approach to fitting distributions

One problematic feature is that there may be a continuum of optimal solutions. In most settings the quantiles are continuous functions of the parameters. When the distributions are continuous, almost surely there will be positive intervals between the data values. Suppose your objective function is optimized by a particular parameter value whose quantiles do not coincide exactly with any of the data: that is, they lie in the interiors of the intervals determined by the nearby data values. (This is an extremely likely event.) Then small changes in the parameter value will move the quantiles slightly, to remain within the same intervals, thereby leaving the chi-squared value unchanged because none of the counts changes. Thus the procedure doesn't even pick out a definite set of parameter values!

Another problematic feature is that this procedure apparently provides no way to obtain estimation errors for the parameters.

Another problem is that you do not know even the most basic properties of this estimator, such as its amount of bias.

Solved – Why does the scaling exponent of a power law fit change so radically when the data is scaled by a constant

Here is what it looks like in R.

x <- c(4, 4.5, 5, 5.5, 6, 6.5)
y1 <- c(0.000159334114311, 0.000184477307337, 0.002931979623674, 0.004321711975947, 
0.006269020390557, 0.012537205790269)
y2 <- c(0.000160708687146, 0.000186102543697, 0.002956862489638, 0.004356837209873, 
0.006325918592142, 0.01266703594829)

> (out1 <- nls(y1 ~ a*x^b, start=list(a=1,b=10)))
Nonlinear regression model
  model:  y1 ~ a * x^b 
   data:  parent.frame() 
        a         b 
2.880e-08 6.926e+00 
 residual sum-of-squares: 2.446e-06

Number of iterations to convergence: 31 
Achieved convergence tolerance: 1.326e-07 

> (out2 <- nls(y1/86400 ~ a*x^b, start=list(a=1,b=10)))
Nonlinear regression model
  model:  y1/86400 ~ a * x^b 
   data:  parent.frame() 
        a         b 
3.333e-13 6.926e+00 
 residual sum-of-squares: 3.277e-16

Number of iterations to convergence: 11 
Achieved convergence tolerance: 2.176e-07 

> (out3 <- nls(y2 ~ a*x^b, start=list(a=1,b=10)))
Nonlinear regression model
  model:  y2 ~ a * x^b 
   data:  parent.frame() 
        a         b 
2.849e-08 6.938e+00 
 residual sum-of-squares: 2.491e-06

Number of iterations to convergence: 30 
Achieved convergence tolerance: 2.456e-07 

> (out4 <- nls(y2/86400 ~ a*x^b, start=list(a=1,b=10)))
Nonlinear regression model
  model:  y2/86400 ~ a * x^b 
   data:  parent.frame() 
        a         b 
3.297e-13 6.938e+00 
 residual sum-of-squares: 3.337e-16

Number of iterations to convergence: 11 
Achieved convergence tolerance: 4.397e-07

plots:

plot(x, y1, "l")
lines(x, y2, "l", col="green")
lines(x, out1$m$fitted(), col="red")
lines(x, out3$m$fitted(), col="red")

enter image description here

Looks reasonable. I don't think nls produces confidence intervals by default. For a readable account of how it works, see Modern Applied Statistics with S, Section 8.2.

None of this answers your question, but I hope it helps.

Best Answer

Related Solutions

Solved – Is this a reasonable approach to fitting distributions

Solved – Why does the scaling exponent of a power law fit change so radically when the data is scaled by a constant

Related Question