Solved – How to calculate the Tweedie prediction based on model coefficients

rtweedie-distribution

What is the formula to calculate the prediction for a Tweedie distribution using model coefficients? I am trying to manually calculate the prediction.

Below is my attempt at reproducible code.

library(tweedie)
library(statmod)

rm(list=ls(all=TRUE))
cat("\014")  
outputdata <- read.csv("example.csv", header = TRUE)
attach(outputdata)

# Fit the glm
fit <- glm( y ~ log(Variable1), data=outputdata, family=tweedie(var.power=1.65, link.power=0) )
summary(fit) 

outputdata$predvals <- predict(fit, type = "response", newdata = outputdata)
write.csv(outputdata, "example output.csv", na = "", row.names = F)

The input data (example.csv) consists of one column as the independent variable (Variable1) and the second column the dependent variable (y).

The data is as follows:

Variable1 y 1 0 2 0.13 3 0 4 0.05 5 0.01 6 0.21 7 0.03 8 0.1 9 0.32
10 0.16 11 0.16 12 0.08 13 0.03 14 0.13 15 0.15 16 0.2 17 0.25 18 0.32
19 0.14 20 0.19 21 0.26 22 0.17 23 0.34 24 0.23 25 0.29 26 0.16 27 0.1
28 0.23 29 0.28 30 0.45 31 0.18 32 0.23 33 0.14 34 0.16 35 0.29
36 0.28 37 0.16 38 0.34 39 0.14 40 0.31 41 0.12 42 0.33 43 0.14 44 0.3
45 0.53 46 0.23 47 0.18 48 0.64 49 0.3 50 0.36 51 0.38 52 0.41 53 0.26
54 0.12 55 0.35 56 0.12 57 0.41 58 0.04 59 0.23 60 0.71 61 0.09
62 0.32 63 0.23 64 0.41 65 0.19 66 0.58 67 0.14 68 0.27 69 0.42
70 0.55 71 0.42 72 0.41 73 0.29 74 0.23 75 0.19 76 0.27 77 0.19
78 0.23 79 0.24 80 0.42 81 0.5 82 0.41 83 0.15 84 0.34 85 0.38 86 0.4
87 0.37 88 0.17 89 0.22 90 2.24 91 0.17 92 0.15 93 0.34 94 0.15 95 0.4
96 0.16 97 0.52 98 0.48 99 0.41 100 0.24

The model output I get is:

Call:
glm(formula = y ~ log(Variable1), family = tweedie(var.power = 1.65,
link.power = 0), data = outputdata)

Deviance Residuals:
Min 1Q Median 3Q Max
-1.50992 -0.38106 -0.04531 0.16910 2.25728

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.17232 0.33346 -9.513 1.38e-15

log(Variable1) 0.49793 0.08646 5.759 9.75e-08

Thank you in advance!

Best Answer

When you pass glm() the tweedie family the return value is a glm object. So you can use the predict() method or the predict.glm() method if you prefer to specify to any future readers of your code that this is a glm.

example(tweedie)
twdeReg <- glm(y~x, family=tweedie(var.power=1, link.power=1))
predict(twdeReg)
predict.glm(twdeReg)

In the predict family of functions you pass the argument newdata=newDataName to specify prediction on a new dataset, default behavior is to predict on the current data. Also, read ?predict to see the 3 options of if you want prediction of the linear combination of predictors, on the y-space, or the other one which I've never found super useful.

Added from comment on the reply:

To get this manually you'll need to use the equation from ?tweedie documentation that describes the link. The doc states: $\mu_i^q = \mathbb{E}(y_i|\vec{x}_i)^q = \vec{x}_i^T\vec{\beta}$ so if you want the expected value you'll need to calculate:

$$\mathbb{E}(y_i|\vec{x}_i) = (\vec{x}_i^T\vec{\beta})^{1/q},$$

where $q$ is the link.power=1 value. so if q=1 as the question is written simply take the product of the estimates times the coefficients and add up all of these products ( $\vec{x}_i^T\vec{\hat{\beta}}$ ) where the 'hat' denotes the estimate.

Related Solutions

Solved – GLM model selection using AICc with Tweedie distribution

You can use AICtweedie directly in MuMIn's functions, just specify it as a rank argument. Alternatively, you could write a wrapper around AICtweedie.

Solved – Given a GLM using Tweedie, how to find the coefficients

Are you familiar with generalized linear models in R? If so, you can fit Tweedie glms just like any other glms. The glm family definition necessary to make this happen is provided by the statmod R package from CRAN.

Tweedie glms assume that the variance function is a power function: $${\rm var}(y)=V(\mu)\phi=\mu^\alpha \phi$$ Special case include normal glms ($\alpha=0$), Poisson glms $\alpha=1$), gamma glms ($\alpha=2$) and inverse-Gaussian glms ($\alpha=3$).

Here is an example of R code:

> library(statmod)
> y <- c(4.0, 5.9, 3.9, 13.2, 10.0, 9.0)
> x <- 1:6
> fit <- glm(y~x, family = 
           tweedie(var.power=1.1, 
           link.power=0))
> summary(fit)

Call:
glm(formula = y ~ x, family = 
         tweedie(var.power = 1.1, 
         link.power = 0))

Deviance Residuals: 
      1        2        3        4        5        6  
-0.2966   0.1183  -1.0742   1.4985   0.1205  -0.6716  

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   1.3625     0.4336   3.143   0.0348 *
x             0.1794     0.1008   1.779   0.1498  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Tweedie family taken to be 1.056557)

    Null deviance: 7.3459  on 5  degrees of freedom
Residual deviance: 3.9670  on 4  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 4

The Tweedie package allows you to fit a glm with any power function and any power link. In the glm family call, var.power is the $\alpha$ parameter so that var.power=1.1 specifies $\alpha=1.1$. The var.power refers to exponent of the glm variance function, so that var.power=0 specifies a normal family, var.power=1 means Poisson family, var.power=2 means gamma family, var.power=3 means inverse Gaussian family and so on. Values between 0 and 1 are not permitted but virtually anything else in allowed.

link.power=0 specifies a log-link. The link is specified in terms of Box-Cox transformation powers, so link.power=1 is the identity link and link.power=0 means log.

The above model assumes that $y_i\sim {\rm Tweedie}_\alpha(\mu_i,\phi)$ where $$\log \mu_i=\beta_0+\beta_1 x_i$$ and $${\rm var}(y_i)=\mu_i^{1.1} \phi$$

The regression coefficients $\beta_j$ have been estimated by maximum likelihood. The dispersion parameter $\phi$ has been estimated using the residual sum of squared residuals -- this is called the Pearson estimator.

Regardless of what $\alpha$ or link you use, any of the downstream functions provided in R for glms will work on the glm fitted model object produced by glm().

Best Answer

Related Solutions

Solved – GLM model selection using AICc with Tweedie distribution

Solved – Given a GLM using Tweedie, how to find the coefficients

Related Question