What is the formula to calculate the prediction for a Tweedie distribution using model coefficients? I am trying to manually calculate the prediction.
Below is my attempt at reproducible code.
library(tweedie)
library(statmod)
rm(list=ls(all=TRUE))
cat("\014")
outputdata <- read.csv("example.csv", header = TRUE)
attach(outputdata)
# Fit the glm
fit <- glm( y ~ log(Variable1), data=outputdata, family=tweedie(var.power=1.65, link.power=0) )
summary(fit)
outputdata$predvals <- predict(fit, type = "response", newdata = outputdata)
write.csv(outputdata, "example output.csv", na = "", row.names = F)
The input data (example.csv) consists of one column as the independent variable (Variable1) and the second column the dependent variable (y).
The data is as follows:
Variable1 y 1 0 2 0.13 3 0 4 0.05 5 0.01 6 0.21 7 0.03 8 0.1 9 0.32
10 0.16 11 0.16 12 0.08 13 0.03 14 0.13 15 0.15 16 0.2 17 0.25 18 0.32
19 0.14 20 0.19 21 0.26 22 0.17 23 0.34 24 0.23 25 0.29 26 0.16 27 0.1
28 0.23 29 0.28 30 0.45 31 0.18 32 0.23 33 0.14 34 0.16 35 0.29
36 0.28 37 0.16 38 0.34 39 0.14 40 0.31 41 0.12 42 0.33 43 0.14 44 0.3
45 0.53 46 0.23 47 0.18 48 0.64 49 0.3 50 0.36 51 0.38 52 0.41 53 0.26
54 0.12 55 0.35 56 0.12 57 0.41 58 0.04 59 0.23 60 0.71 61 0.09
62 0.32 63 0.23 64 0.41 65 0.19 66 0.58 67 0.14 68 0.27 69 0.42
70 0.55 71 0.42 72 0.41 73 0.29 74 0.23 75 0.19 76 0.27 77 0.19
78 0.23 79 0.24 80 0.42 81 0.5 82 0.41 83 0.15 84 0.34 85 0.38 86 0.4
87 0.37 88 0.17 89 0.22 90 2.24 91 0.17 92 0.15 93 0.34 94 0.15 95 0.4
96 0.16 97 0.52 98 0.48 99 0.41 100 0.24
The model output I get is:
Call:
glm(formula = y ~ log(Variable1), family = tweedie(var.power = 1.65,
link.power = 0), data = outputdata)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.50992 -0.38106 -0.04531 0.16910 2.25728
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.17232 0.33346 -9.513 1.38e-15
log(Variable1) 0.49793 0.08646 5.759 9.75e-08
Thank you in advance!
Best Answer
When you pass
glm()
the tweedie family the return value is aglm
object. So you can use thepredict()
method or thepredict.glm()
method if you prefer to specify to any future readers of your code that this is aglm
.In the predict family of functions you pass the argument
newdata=newDataName
to specify prediction on a new dataset, default behavior is to predict on the current data. Also, read?predict
to see the 3 options of if you want prediction of the linear combination of predictors, on the y-space, or the other one which I've never found super useful.Added from comment on the reply:
To get this manually you'll need to use the equation from
?tweedie
documentation that describes the link. The doc states: $\mu_i^q = \mathbb{E}(y_i|\vec{x}_i)^q = \vec{x}_i^T\vec{\beta}$ so if you want the expected value you'll need to calculate:$$\mathbb{E}(y_i|\vec{x}_i) = (\vec{x}_i^T\vec{\beta})^{1/q},$$
where $q$ is the
link.power=1
value. so ifq=1
as the question is written simply take the product of the estimates times the coefficients and add up all of these products ( $\vec{x}_i^T\vec{\hat{\beta}}$ ) where the 'hat' denotes the estimate.