Solved – Calculation of the gravity model in R and Stata software: why are coefficients the same but standard errors different

generalized linear modelrstata

We performed calculation of the gravity model in R and Stata software.

For calculations we used the standard package glmm in R (with parameter family = quasipoisson), and ppml in Stata.

Call the calculation procedure in R:

summary(glmm<-glm(formula=exports ~ ln_GDPimporter + ln_GDPexporter + 
    ln_GDPimppc + ln_GDPexppc + ln_Distance + ln_Tariff + ln_ExchangeRate +
    Contig + Comlang + Colony_CIS + EAEU_CIS + EU_European_Union, 
    family=quasipoisson(link="log"),data=data_pua))

The results in R were:

The results of the calculations in R

On the same data, we performed calculations in Stata, using ppml procedure:

ppml exports ln_gdpimporter ln_gdpexporter ln_gdpimppc ln_gdpexppc ln_distance ln_tariff ln_exchangerate contig comlang colony_cis eaeu_cis eu_european_union

The results of the calculations in Stata were following:

The results of the calculations in Stata

As you can see, model coefficients (second column in the results table) are the same at least until the 4th mark decimal place.

However, other results (from the third column in the table of results) are not the same.

  • Could you explain differences in the results?

  • In particular, why are coefficients the same (the first result table columns), but standard errors are not?

Best Answer

I note that the Stata coefficient table mentioned "Robust Std. Err.", while glmm is probably not using robust errors. That would account for SE differences.

Also, ppml seems to actually drop "non-significant" regressors, and R's quasipoisson family allows for over dispersion in a way that's different from, say, negative binomial regression, which is perhaps different from ppml.

I noticed that you asked in a couple of places about what R package would yield equivalent results to ppml for (economics) gravity models, and got no answers. I'm sorry to see that and wish I could give a more-informed recommendation. It appears that what you need is a Poisson regression with robust standard errors, that handles zero values. I'm not sure what R packages support that. (Not sure if ppml handles over dispersion or not.)

Bayesian regression packages such as rstanarm might handle heteroscedasticity more robustly, but I am not sure. I'd tend to use something like a student_t family for that, but you have to use poisson so I'm not sure of the answer there. You might try the negative binomial family (neg_binomial_2 in rstanarm's stan_glm), which also handles over-dispersion and may be more robust than quasipoisson.

See also: When to use robust standard errors in Poisson regression?

Related Question