A quick way to get at the standardized beta coefficients directly from any lm (or glm) model in R, try using lm.beta(model)
. In the example provided, this would be:
library("MASS")
nb = glm.nb(responseCountVar ~ predictor1 + predictor2 +
predictor3 + predictor4 + predictor5 + predictor6 +
predictor7 + predictor8 + predictor9 + predictor10 +
predictor11 + predictor12 + predictor13 + predictor14 +
predictor15 + predictor16 + predictor17 + predictor18 +
predictor19 + predictor20 + predictor21,
data=myData, control=glm.control(maxit=125))
summary(nb)
library(QuantPsyc)
lm.beta(nb)
For the standard linear regression model the absolute value of the coefficient estimates and the p-value are not related in the way you describe. It is very possible to have absolutely large coefficients which are insignificant and absolutely small coefficients which are very significant. What your missing in your interpretation is the effect of the coefficient estimate standard errors.
The coefficients R reports (lets call them $b_1,b_2,b_3,...,b_k$) are the best linear unbiased estimators of the true parameters $\beta_1,\beta_2,\beta_3,...,\beta_k$ in that they minimize the sum of squared error or formally:
$$
\{b_1,b_2,...,b_k\} = {\textrm{argmin} \atop \alpha}\left\{ \sum_{i=1}^{n}(y_i-\alpha_1x_{i,1}-. . .-\alpha_kx_{i,k})^2]\right\}
$$
The p-value for the $i^{th}$ coefficient which R is reporting is the result of the following hypothesis test:
$H_0: \beta_i = 0$
$H_A: \beta_i \neq 0$
Assuming the regression is properly specified, it can be shown, with the central limit theorem, that each $b_i$ is a normally distributed random variable with mean $\beta_i$ and some standard deviation (also called standard error) $\sigma_i$. This is because the $b$'s are estimated with a random sample so they too are random variables (roughly speaking).
What determines the $i^{th}$ p-value is where 0 "lands" in the normal distribution $N(\beta_i,\sigma_i^2)$ (technically the test is done using a t-distribution...but the difference is not so important for addressing your question). If zero is in the tails of $N(\beta_i,\sigma_i^2)$ the p-value is low, if it's more in the middle the p-value is high.
So given two estimates $b_i$ and $b_j$ where $b_i$ is "super far away" from zero and $b_j$ is "super close to" zero, the p-value of $b_i$ would be lower than $b_j$ assuming $\sigma_i$=$\sigma_j$. The part you are missing in your interpretation is that $\sigma_i$ and $\sigma_j$ can be very different. Essentially if $b_i$ is "huge" but $\sigma_i$ is also "huge" you see that you can get a high p-value. Conversely for "small" $b_i$ and "super small" $\sigma_i$, you see you can get a small p-value.
I hope that helps!
Best Answer
It sounds like the paper uses a multiple regression model in the form
$$Y = \beta_0 + \sum_i \beta_i \xi_i + \varepsilon$$
where the $\xi_i$ are standardized versions of the independent variables; viz.,
$$\xi_i = \frac{x_i - m_i}{s_i}$$
withe $m_i$ the mean (such as 12.56 in the example) and $s_i$ the standard deviation (such as 9.02 in the example) of the values of the $i^\text{th}$ variable $x_i$ ('buslines' in the example). $\beta_0$ is the intercept (if present). Plugging this expression into the fitted model, with its "betas" written as $\hat{\beta_i}$ (0.275 in the example), and doing some algebra gives the estimates
$$\hat{Y} = \hat{\beta_0} + \sum_i \hat{\beta_i} \frac{x_i - m_i}{s_i}=\left(\hat{\beta_0}-\left(\sum_i\frac{\hat{\beta_i m_i}}{s_i}\right)\right)+\sum_i\left(\frac{\hat{\beta_i}}{s_i}\right)x_i.$$
This shows that the coefficients of the $x_i$ in the model (apart from the constant term) are obtained by dividing the betas by the standard deviations of the independent variables and that the intercept is adjusted by subtracting a suitable linear combination of the betas.
This gives you two ways to predict a new value from a vector $(x_1, \ldots, x_p)$ of independent values:
Using the means $m_i$ and standard deviations $s_i$ as reported in the paper (not recomputed from any new data!), calculate $(\xi_1,\ldots, \xi_p) = ((x_1-m_1)/s_1, \ldots, (x_p-m_p)/s_p)$ and plug those into the regression formula as given by the betas or, equivalently,
Plug $(x_1, \ldots, x_p)$ into the algebraically equivalent formula derived above.
If the paper is using a Generalized Linear Model, you may need to follow this calculation by applying the inverse "link" function to $\hat{Y}$. For example, with logistic regression it would be necessary to apply the logistic function $1/(1 + \exp(-\hat{Y}))$ to obtain the predicted probability ($\hat{Y}$ is the predicted log odds).