Solved – How to report results from a Beta Regression (R output)

beta-regressionrreporting

I am looking for advice/input on how to report results from a beta regression output. My data is bound between 0 and 1, as ratios, and I am looking at a simple relationship between the response variable (D_Ratio), and predictor (body length, or BL) variable which is continuous. I used the betareg function from the betareg package in R.

For example, here is my R output:

Call:
    betareg(formula = D_Ratio ~ BL, data = wild, link = c("cloglog"))

    Standardized weighted residuals 2:
        Min      1Q  Median      3Q     Max 
     -1.4137 -0.6463 -0.1782  0.3970  2.6160 

Coefficients (mean model with cloglog link):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -2.14147    0.51930  -4.124 3.73e-05 ***
     BL      0.05252    0.01673   3.139  0.00169 ** 

Phi coefficients (precision model with identity link): 

    Estimate Std. Error z value Pr(>|z|)
(phi)1.9522     0.2969   6.576 4.82e-11 ***
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Type of estimator: ML (maximum likelihood)
Log-likelihood: 8.766 on 3 Df
Pseudo R-squared: 0.2058
Number of iterations: 13 (BFGS) + 1 (Fisher scoring)

Firstly, I noticed that there are two tables to consider; the coefficients from the mean model link and the coefficients from the precision model. Which coefficients do I report?
I am finding different answers in other threads… Right now I am thinking it should be the pseudo R squared, Z value, P value from the mean model…Or does the "Estimate" coefficient term mean something significant, like slope? I ask because I am under the impression that this relationship is not a straight line.

Unfortunately, I am a relatively new R user so if there is a coding issue here, please let me know.

Best Answer

The beta regression model can have two submodels: (1) a regression model for the mean - similar to a linear regression model or a binary regression model; (2) a regression model for the precision parameter - similar to the inverse of a variance in a linear regression model or the dispersion in a GLM.

So far you have just used regressors in (1) but just a constant in (2). I would encourage you to check whether the model D_Ratio ~ BL | BL with the regressor BL in both parts leads to an improved fit.

If not, then you can probably best report the coefficients from the mean equation as you would for a binary regression model. And then you can add the precision parameter estimate (as you would in a linear regression), the pseudo-R-squared and/or log-likelihood and/or AIC/BIC.

If the regressor plays a role in both parts of the model, then probably report both sets of coefficients.

You can also use the function mtable(betareg_object,...) from the memisc package to generate such a table. Export to LaTeX is also available. Furthermore, you might consider a scatterplot of D_RATIO ~ BL with the fitted mean regression line plus possibly some quantiles (e.g., 5% and 95%). The vignette("betareg", package = "betareg") has some examples like that.

Related Solutions

Solved – Beta regression – interpret coefficients using loglog link

As discussed by @StatsStudent and in the comments: There is no simple and intuitive ceteris paribus interpretation for log-log links. The easiest link that still assures predictions are in $(0, 1)$ is the logit link, see: interpretation of betareg coef However, even in that case it takes some practice to quickly process the meaning of coefficients.

Hence, in general I recommend to complement other analyses by looking at predictions and discrete changes for regressor combinations of interest. I typically set up some new dummy data set that contains combinations of regressor values that I'm interest in and then I look at predictions, e.g., of means, variances, medians, or other quantiles.

As a simple example, consider your artificial data:

d <- data.frame(
  x1 = c(0.051, 0.049, 0.046, 0.042, 0.042, 0.041, 0.038, 0.037, 0.043, 0.031),
  x2 = c(0.11, 0.12, 0.09, 0.21, 0.18, 0.11, 0.13, 0.11, 0.08, 0.10),
  y  = c(0.97, 0.87, 0.77, 0.65, 0.77, 0.84, 0.76, 0.73, 0.82, 0.90)
)
m <- betareg(y ~ x1 + x2, data = d, link = "loglog")

Then, we create a new dummy data set that fixed x1 at its mean and lets x2 vary across its range:

nd <- data.frame(x1 = 0.042, x2 = 8:21/100)

To this data set we can then add the predicted means which show what a 0.01 unit change in x2 does:

nd$mean <- predict(m, nd, type = "response")
nd
##       x1   x2      mean
## 1  0.042 0.08 0.8671101
## 2  0.042 0.09 0.8571699
## 3  0.042 0.10 0.8465540
## 4  0.042 0.11 0.8352276
## 5  0.042 0.12 0.8231556
## 6  0.042 0.13 0.8103037
## 7  0.042 0.14 0.7966381
## 8  0.042 0.15 0.7821265
## 9  0.042 0.16 0.7667387
## 10 0.042 0.17 0.7504468
## 11 0.042 0.18 0.7332267
## 12 0.042 0.19 0.7150583
## 13 0.042 0.20 0.6959266
## 14 0.042 0.21 0.6758232

Clearly the effect of a 0.01 unit change in x2 leads to different predicted changes in the expectation of y:

summary(diff(nd$mean))
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -0.02010 -0.01722 -0.01451 -0.01471 -0.01207 -0.00994

The changes can also be brought out graphically. The code below shows the mean (solid) along with the corresponding 5%, 50%, and 95% quantile (dashed) of the predicted beta distribution. Also, the observations from d are added:

plot(mean ~ x2, data = nd, type = "l")
lines(nd$x2, predict(m, nd, type = "quantile", at = 0.5), lty = 2)
lines(nd$x2, predict(m, nd, type = "quantile", at = 0.05), lty = 2)
lines(nd$x2, predict(m, nd, type = "quantile", at = 0.95), lty = 2)
points(y ~ x2, data = d)

Note, however, that in the actual data d the variable x1 varies along with x2 while in the new dummy data nd the variable x1 is fixed. More generally plotting something like partial residuals would be better than actual observations.

A more formal way of looking at such "effects" displays is provided in packages effects (see http://doi.org/10.18637/jss.v087.i09 and the earlier references therein) or lsmeans (see https://doi.org/10.18637/jss.v069.i01).

Beta Regression – How to Calculate a Beta Regression Prediction from Coefficients

You are right: The expected proportion $\mathrm{E}(y) = \mu$ can be computed by applying the inverse link function to the linear predictor $\eta = \beta_0 + \beta_1 \cdot x_1 + \dots + \beta_k \cdot x_k$. For the default logit link you have $\mu = \exp(\eta) / (1 + \exp(\eta))$.

The second submodel for the precision $\phi$ does not affect this first submodel for the expectation $\mu$. But it is relevant for the variance $\mathrm{Var}(y) = \mu \cdot (1 - \mu) / (1 + \phi)$. To compute the precision $\phi$ you also apply the corresponding inverse link function (default link: log) to the corresponding linear predictor with regressors $z_j$ and coefficients $\gamma_j$: $\phi = \exp(\gamma_0 + \gamma_1 \cdot z_1 + \dots + \gamma_l \cdot z_l)$.

Thus in your example:

$\eta = -2.36040 -4.46786 \cdot \mathtt{elevation} + 1.04524 \cdot \mathtt{DJF} + 2.66096 \cdot \mathtt{Dry\_diff} + 0.40112 \cdot \mathtt{Wet\_diff} + 0.55956 \cdot \mathtt{SON}$.
$\mu = \exp(\eta)/(1 + \exp(\eta))$.
$\phi = \exp(1.53855 + 0.33296 \cdot \mathtt{basin\_namebasin\_11} + 0.03708 \cdot \mathtt{basin\_namebasin\_12} - 0.42456 \cdot \mathtt{basin\_namebasin\_13} + 0.37881 \cdot \mathtt{basin\_namebasin\_2}$.
All the $\mathtt{basin\_namebasin\_*}$ variables are 0/1 indicators for the corresponding category.

Best Answer

Related Solutions

Solved – Beta regression – interpret coefficients using loglog link

Beta Regression – How to Calculate a Beta Regression Prediction from Coefficients

Related Question