Solved – How to calculate the Tweedie prediction based on model coefficients

rtweedie-distribution

What is the formula to calculate the prediction for a Tweedie distribution using model coefficients? I am trying to manually calculate the prediction.

Below is my attempt at reproducible code.

library(tweedie)
library(statmod)

rm(list=ls(all=TRUE))
cat("\014")  
outputdata <- read.csv("example.csv", header = TRUE)
attach(outputdata)

# Fit the glm
fit <- glm( y ~ log(Variable1), data=outputdata, family=tweedie(var.power=1.65, link.power=0) )
summary(fit) 

outputdata$predvals <- predict(fit, type = "response", newdata = outputdata)
write.csv(outputdata, "example output.csv", na = "", row.names = F) 

The input data (example.csv) consists of one column as the independent variable (Variable1) and the second column the dependent variable (y).

The data is as follows:

Variable1 y 1 0 2 0.13 3 0 4 0.05 5 0.01 6 0.21 7 0.03 8 0.1 9 0.32
10 0.16 11 0.16 12 0.08 13 0.03 14 0.13 15 0.15 16 0.2 17 0.25 18 0.32
19 0.14 20 0.19 21 0.26 22 0.17 23 0.34 24 0.23 25 0.29 26 0.16 27 0.1
28 0.23 29 0.28 30 0.45 31 0.18 32 0.23 33 0.14 34 0.16 35 0.29
36 0.28 37 0.16 38 0.34 39 0.14 40 0.31 41 0.12 42 0.33 43 0.14 44 0.3
45 0.53 46 0.23 47 0.18 48 0.64 49 0.3 50 0.36 51 0.38 52 0.41 53 0.26
54 0.12 55 0.35 56 0.12 57 0.41 58 0.04 59 0.23 60 0.71 61 0.09
62 0.32 63 0.23 64 0.41 65 0.19 66 0.58 67 0.14 68 0.27 69 0.42
70 0.55 71 0.42 72 0.41 73 0.29 74 0.23 75 0.19 76 0.27 77 0.19
78 0.23 79 0.24 80 0.42 81 0.5 82 0.41 83 0.15 84 0.34 85 0.38 86 0.4
87 0.37 88 0.17 89 0.22 90 2.24 91 0.17 92 0.15 93 0.34 94 0.15 95 0.4
96 0.16 97 0.52 98 0.48 99 0.41 100 0.24

The model output I get is:

Call:
glm(formula = y ~ log(Variable1), family = tweedie(var.power = 1.65,
link.power = 0), data = outputdata)

Deviance Residuals:
Min 1Q Median 3Q Max
-1.50992 -0.38106 -0.04531 0.16910 2.25728

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.17232 0.33346 -9.513 1.38e-15

log(Variable1) 0.49793 0.08646 5.759 9.75e-08

Thank you in advance!

Best Answer

When you pass glm() the tweedie family the return value is a glm object. So you can use the predict() method or the predict.glm() method if you prefer to specify to any future readers of your code that this is a glm.

example(tweedie)
twdeReg <- glm(y~x, family=tweedie(var.power=1, link.power=1))
predict(twdeReg)
predict.glm(twdeReg)

In the predict family of functions you pass the argument newdata=newDataName to specify prediction on a new dataset, default behavior is to predict on the current data. Also, read ?predict to see the 3 options of if you want prediction of the linear combination of predictors, on the y-space, or the other one which I've never found super useful.


Added from comment on the reply:

To get this manually you'll need to use the equation from ?tweedie documentation that describes the link. The doc states: $\mu_i^q = \mathbb{E}(y_i|\vec{x}_i)^q = \vec{x}_i^T\vec{\beta}$ so if you want the expected value you'll need to calculate:

$$\mathbb{E}(y_i|\vec{x}_i) = (\vec{x}_i^T\vec{\beta})^{1/q},$$

where $q$ is the link.power=1 value. so if q=1 as the question is written simply take the product of the estimates times the coefficients and add up all of these products ( $\vec{x}_i^T\vec{\hat{\beta}}$ ) where the 'hat' denotes the estimate.

Related Question