Solved – how to specify a distribution for left skewed data

bayesiandistributionshierarchical-bayesianmathematical-statistics

I am doing bayesian analysis. Exploratory analysis shows the parameter might has a left skewed shape. So what kind of distribution should I used as prior distribution for this parameter? Any kind of transformation that will change the parameter to have a normal shape(please note the parameters have negative values)?

#

The question is simple: I plotted my data and it looks like the plot below. So what kind of distribution should I assume the data is coming from?

 left skewed distribution

Best Answer

You can use the brms package with a Skew Normal distribution to model both right or left-skewed data. This distribution has three parameters for location, scale, and skewness respectively. The parameter for skewness (alpha) indicates the "kind of skewness" you have. When alpha < 0, the distribution is left-skewed while when alpha > 0 the distribution is right-skewed.

Here is a simple example on how to fit this kind of model with brms, and a comparison with a model using a Gaussian likelihood.

library(patchwork)
library(tidverse)
library(brms)

set.seed(666)

# generate some skewed data
data <- rskew_normal(1e4, mu = 0, sigma = 1, alpha = -5)

# fitting a brms model with a Gaussian likelihood
model_normal <- brm(data ~ 1, family = gaussian(), data = data)

# fitting a brms model with a skew normal likelihood
model_skew <- brm(data ~ 1, family = skew_normal(), data = data)

# posterior predictive checking
pp_check(model_normal, nsamples = 1e2) + pp_check(model_skew, nsamples = 1e2)

The last command should return the following picture.

On the left panel you can see plotted the raw data along with data simulated from the posterior distribution of the Gaussian model. As expected, it systematically misrepresents the skewness of the raw data. On the right, you can see the match between the raw data and data simulated from the skew-normal model.

The summary of the model will give you the mean and 95% quantile intervals of the posterior distribution for each parameter.

summary(model_skew)

 Family: skew_normal 
  Links: mu = identity; sigma = identity; alpha = identity 
Formula: data ~ 1 
   Data: data (Number of observations: 10000) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept    -0.01      0.01    -0.03     0.01       2676 1.00

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sigma     1.01      0.01     1.00     1.03       2389 1.00
alpha    -5.12      0.20    -5.53    -4.74       2256 1.00

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).

Hope this helps.

Related Question