How to calculate a beta regression prediction from the coefficients

beta-regressionlogisticregression

I have applied the R betareg function to my data using the default logit link and phi precision log-link for the categorical data. My equation is:

betareg(formula = proportion ~ var1 + var2 + var3 + var4 + var5 + 
    1 | categorical_variable, data = data_train)

I have a series of beta values: b0 = intercept, b1 to b5 are the coefficients for the feature variables, and then there are constants for each of the categorical variables.
I would like to predict a proportion using all these coefficients.

I understand that the logit link reverse equation is:
proportion = 1/(1+exp(-lp)),
or equivalently, proportion = exp(lp)/(1+exp(lp))

where lp = b0 + b1*var1 + b2*var2 + b3*var3 + b4*var4 + b5*var5,

However, I'm not sure how to add the categorical constant value to arrive at the final proportion prediction. I tried:

proportion = 1/(1+exp(-lp)) + exp(category_constant)

but this doesn't give me the result that I get when using the predict function. Could you please help me to interpret how I can derive the proportion, given the coefficients?

Here is my model output:

Call:
betareg(formula = PS ~ elevation + DJF + Dry_diff + Wet_diff + SON + 
    1 | basin_name, data = data_train)

Standardized weighted residuals 2:
    Min      1Q  Median      3Q     Max 
-2.1583 -0.6168 -0.1097  0.5028  4.4910 

Coefficients (mean model with logit link):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -2.36040    0.18300 -12.898  < 2e-16 ***
elevation   -4.46786    0.20687 -21.598  < 2e-16 ***
DJF          1.04524    0.29863   3.500 0.000465 ***
Dry_diff     2.66096    0.26908   9.889  < 2e-16 ***
Wet_diff     0.40112    0.08421   4.763 1.90e-06 ***
SON          0.55956    0.10093   5.544 2.95e-08 ***

Phi coefficients (precision model with log link):
                   Estimate Std. Error z value Pr(>|z|)    
(Intercept)         1.53855    0.09902  15.538  < 2e-16 ***
basin_namebasin_11  0.33296    0.16287   2.044  0.04092 *  
basin_namebasin_12  0.03708    0.16760   0.221  0.82491    
basin_namebasin_13 -0.42456    0.13222  -3.211  0.00132 ** 
basin_namebasin_2   0.37881    0.16305   2.323  0.02016 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Type of estimator: ML (maximum likelihood)
Log-likelihood: 691.2 on 11 Df
Pseudo R-squared: 0.7277
Number of iterations: 24 (BFGS) + 6 (Fisher scoring)

Best Answer

You are right: The expected proportion $\mathrm{E}(y) = \mu$ can be computed by applying the inverse link function to the linear predictor $\eta = \beta_0 + \beta_1 \cdot x_1 + \dots + \beta_k \cdot x_k$. For the default logit link you have $\mu = \exp(\eta) / (1 + \exp(\eta))$.

The second submodel for the precision $\phi$ does not affect this first submodel for the expectation $\mu$. But it is relevant for the variance $\mathrm{Var}(y) = \mu \cdot (1 - \mu) / (1 + \phi)$. To compute the precision $\phi$ you also apply the corresponding inverse link function (default link: log) to the corresponding linear predictor with regressors $z_j$ and coefficients $\gamma_j$: $\phi = \exp(\gamma_0 + \gamma_1 \cdot z_1 + \dots + \gamma_l \cdot z_l)$.

Thus in your example:

  • $\eta = -2.36040 -4.46786 \cdot \mathtt{elevation} + 1.04524 \cdot \mathtt{DJF} + 2.66096 \cdot \mathtt{Dry\_diff} + 0.40112 \cdot \mathtt{Wet\_diff} + 0.55956 \cdot \mathtt{SON}$.
  • $\mu = \exp(\eta)/(1 + \exp(\eta))$.
  • $\phi = \exp(1.53855 + 0.33296 \cdot \mathtt{basin\_namebasin\_11} + 0.03708 \cdot \mathtt{basin\_namebasin\_12} - 0.42456 \cdot \mathtt{basin\_namebasin\_13} + 0.37881 \cdot \mathtt{basin\_namebasin\_2}$.
  • All the $\mathtt{basin\_namebasin\_*}$ variables are 0/1 indicators for the corresponding category.
Related Question