I have applied the R betareg function to my data using the default logit link and phi precision log-link for the categorical data. My equation is:
betareg(formula = proportion ~ var1 + var2 + var3 + var4 + var5 +
1 | categorical_variable, data = data_train)
I have a series of beta values: b0 = intercept, b1 to b5 are the coefficients for the feature variables, and then there are constants for each of the categorical variables.
I would like to predict a proportion using all these coefficients.
I understand that the logit link reverse equation is:
proportion = 1/(1+exp(-lp))
,
or equivalently, proportion = exp(lp)/(1+exp(lp))
where lp = b0 + b1*var1 + b2*var2 + b3*var3 + b4*var4 + b5*var5
,
However, I'm not sure how to add the categorical constant value to arrive at the final proportion prediction. I tried:
proportion = 1/(1+exp(-lp)) + exp(category_constant)
but this doesn't give me the result that I get when using the predict function. Could you please help me to interpret how I can derive the proportion, given the coefficients?
Here is my model output:
Call:
betareg(formula = PS ~ elevation + DJF + Dry_diff + Wet_diff + SON +
1 | basin_name, data = data_train)
Standardized weighted residuals 2:
Min 1Q Median 3Q Max
-2.1583 -0.6168 -0.1097 0.5028 4.4910
Coefficients (mean model with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.36040 0.18300 -12.898 < 2e-16 ***
elevation -4.46786 0.20687 -21.598 < 2e-16 ***
DJF 1.04524 0.29863 3.500 0.000465 ***
Dry_diff 2.66096 0.26908 9.889 < 2e-16 ***
Wet_diff 0.40112 0.08421 4.763 1.90e-06 ***
SON 0.55956 0.10093 5.544 2.95e-08 ***
Phi coefficients (precision model with log link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.53855 0.09902 15.538 < 2e-16 ***
basin_namebasin_11 0.33296 0.16287 2.044 0.04092 *
basin_namebasin_12 0.03708 0.16760 0.221 0.82491
basin_namebasin_13 -0.42456 0.13222 -3.211 0.00132 **
basin_namebasin_2 0.37881 0.16305 2.323 0.02016 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Type of estimator: ML (maximum likelihood)
Log-likelihood: 691.2 on 11 Df
Pseudo R-squared: 0.7277
Number of iterations: 24 (BFGS) + 6 (Fisher scoring)
Best Answer
You are right: The expected proportion $\mathrm{E}(y) = \mu$ can be computed by applying the inverse link function to the linear predictor $\eta = \beta_0 + \beta_1 \cdot x_1 + \dots + \beta_k \cdot x_k$. For the default logit link you have $\mu = \exp(\eta) / (1 + \exp(\eta))$.
The second submodel for the precision $\phi$ does not affect this first submodel for the expectation $\mu$. But it is relevant for the variance $\mathrm{Var}(y) = \mu \cdot (1 - \mu) / (1 + \phi)$. To compute the precision $\phi$ you also apply the corresponding inverse link function (default link: log) to the corresponding linear predictor with regressors $z_j$ and coefficients $\gamma_j$: $\phi = \exp(\gamma_0 + \gamma_1 \cdot z_1 + \dots + \gamma_l \cdot z_l)$.
Thus in your example: