I have a frequency table of claims made by policyholders:
Claims 0, 1, 2, 3, 4
Frequency 2962, 382, 47, 25, 4
And I am going to carry out a chi-squared goodness of fit test to see if it conforms to a Poisson distribution (there are probably far better methods – but I'm teaching basic stats – so go with the flow please).
I converted the frequency table into a vector as follows:
n<-c(0,1,2,3,4)
x<-c(2962,382,47,25,4)
data <- rep(n,x)
I then fitted a Poisson(mu) as follows:
library(MASS)
fitted <-fitdistr(data,"Poisson")
mu<-fitted$estimate
I obtained the expected probabilities under this model using:
exptd<-c(dpois(0,mu),dpois(1,mu),dpois(2,mu),dpois(3,mu),1-ppois(3,mu))
Using round(sum(x)*exptd,3)
I noticed that the expected frequency of the last two groups were both <5 and so I combined the last three groups together:
x2<-c(x[1],x[2],sum(x[3:5]))
exptd2<-c(exptd[1],exptd[2],sum(exptd[3:5]))
I now carried out my chi-squared goodness of fit test:
chisq.test(x2,p=exptd2)
I have two issues:
- The chi-square test has the wrong degrees of freedom as I estimated a parameter on the data.
- This seems dreadfully long-winded – there must be a quicker way of doing this test.
Unfortunately despite searching I can't find examples of this online (which makes me think that this method isn't used despite what our textbooks say) so I need your help.
Many thanks in advance for the help.
Best Answer
User Defined Function can be used for such requirements
I tried to make a naive function for mentioned purpose. you can expand this.
data should be a vector class.
model is the model defined under 'densfunc' under 'fitdistr' under package MASS... should not be write as object, but under quotation marks.
dens is the density function from default stats package.
df is set default, but you can also add by df=n. where n is positive Integer.
Data
Now, trials
Ouput:- pvalue = 6.845853e-91 ; df = 3[1] "Warning: there are some expected frequencies that are less than 5"
Output:- pvalue = 5.72538e-13 ; df = 1
Output:- pvalue = 5.267489e-12 ; df = 2
Output:- pvalue = 2.281883e-09 ; df = 3[1] "Warning: there are some expected frequencies that are less than 5"
Output:- pvalue = 0.05292066 ; df = 1
Output:- pvalue = 0.1536295 ; df = 2