When deciding on a distribution, the science is more important than the tests. Think about what lead to the data, what values are possible, likely, and meaningful. The formal tests can find obvious differences, but often cannot rule out distributions that are similar (and note that the chi-squared distribution is a special case of the gamma distribution). Look at this quick simulation (and try it with other values):
> mean(replicate(1000, ks.test( rt(5000, df=20), pnorm )$p.value)<0.05)
[1] 0.111
The ks.test
can only find the difference between a t-distribution with 20 df and a standard normal 11% of the time, even with a sample size of 5000.
If you really want to test the distributions, then I would suggest using the vis.test
function in the TeachingDemos
package. Instead of rigid tests of exact fit, it presents a plot of the original data mixed in with similar plots from the candidate distribution and asks you (or another viewer) to pick out the plot of the original data. If you cannot distinguish visually between your data and the simulated data then the candidate distribution is probably a reasonable starting point (but this does not rule out other possible distributions, think about which ones make the most sense scientifically).
Another approach would be to generate your new data from the density estimate of your original data. The logspline
package for R has functions to estimate the density, then generate random data from that estimate. Or, generating data from a kernal density estimate means selecting a point from your data, then generating a random value from the kernal centered around that point. This can be as simple as selecting a random sample from the data with replacement, then adding small normal deviates to the values.
Da Fonseca and Zaatour (2014) provide the following likelihood function which I use to estimate $\mu$, $\alpha$ and $\beta$ (p. 555, Equation (26)):
$L=T-T\lambda_\infty-\sum\limits_{i=1}^{N_T}\frac{\alpha}{\beta}\left(1-e^{-\beta\left(T-t\right)}\right)+\sum\limits_{i=1}^{N_T}\ln\left(\lambda_\infty+\alpha A\left(i\right)\right)$,
where $A\left(i\right)=\sum\limits_{t_j-t_i}e^{-\beta\left(t_i-t_j\right)}$.
In section 2.3.2 the authors acknowledge that estimation on the basis of $L$ may be time-consuming and present an alternative estimation procedure based on generalized method of moments (GMM). I haven't tried it but maybe that's what you're looking for.
Best Answer
the
tick
module only allows you to estimate your Hawkes process for a fixed value of the decay $\beta$ for an exponential kernel $\phi(t) = \alpha \beta \exp (-\beta t)$. This is because estimating jointly $\alpha$ and $\beta$ leeds to non convex, poorly scalable algorithms.However, you can try several values for $\beta$ and keep the one that gives you the best score. Here is an example of this procedure on a finance dataset.