It's never easy telling your professor that they are wrong.
Standardized coefficients can be greater than 1.00, as that article explains and as is easy to demonstrate. Whether they should be excluded depends on why they happened - but probably not.
They are a sign that you have some pretty serious collinearity. One case where they often occur is when you have non-linear effects, such as when $x$ and $x^2$ are included as predictors in a model.
Here's a quick demonstration:
data(cars)
cars$speed2 <- cars$speed^2
cars$speed3 <- cars$speed^3
fit1 <- lm(dist ~ speed, data=cars)
fit2 <- lm(dist ~ speed + cars$speed2, data=cars)
fit3 <- lm(dist ~ speed + cars$speed2 + speed3, data=cars)
summary(fit1)
summary(fit2)
summary(fit3)
lm.beta(fit1)
lm.beta(fit2)
lm.beta(fit3)
Final bit of output:
> lm.beta(fit3)
speed speed2 speed3
1.395526 -2.212406 1.681041
Or if you prefer you can standardize the variables first:
zcars <- as.data.frame(rapply(cars, scale, how="list"))
fit3 <- lm(dist ~ speed + speed2 + speed3, data=zcars)
summary(fit3)
Call:
lm(formula = dist ~ speed + speed2 + speed3, data = zcars)
Residuals:
Min 1Q Median 3Q Max
-1.03496 -0.37258 -0.08659 0.27456 1.73426
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.440e-16 8.344e-02 0.000 1.000
speed 1.396e+00 1.396e+00 1.000 0.323
speed2 -2.212e+00 3.163e+00 -0.699 0.488
speed3 1.681e+00 1.853e+00 0.907 0.369
Residual standard error: 0.59 on 46 degrees of freedom
Multiple R-squared: 0.6732, Adjusted R-squared: 0.6519
F-statistic: 31.58 on 3 and 46 DF, p-value: 3.074e-11
You don't need to do it with lm(), you can do it with matrix algebra if you prefer:
Rxx <- cor(cars)[c(1, 3, 4), c(1, 3, 4)]
Rxy <- cor(cars)[2, c(1, 3, 4)]
B <- (ginv(Rxx)) %*% Rxy
B
[,1]
[1,] 1.395526
[2,] -2.212406
[3,] 1.681041
See the documentation:
help(lm.circular)
"If type=="c-l" or lm.circular.cl is called directly, this function
implements the homoscedastic version of the maximum likelihood
regression model proposed by Fisher and Lee (1992). The model assumes
that a circular response variable theta has a von Mises distribution
with concentration parameter kappa, and mean direction related to a
vector of linear predictor variables according to the relationship: mu
+ 2*atan(beta'*x), where mu and beta are unknown parameters, beta being a vector of regression coefficients. The function uses Green's
(1984) iteratively reweighted least squares algorithm to perform the
maximum likelihood estimation of kappa, mu, and beta. Standard errors
of the estimates of kappa, mu, and beta are estimated via large-sample
asymptotic variances using the information matrix. An estimated
circular standard error of the estimate of mu is then obtained
according to Fisher and Lewis (1983, Example 1)."
Thus you should compare with a different model
> nls(y~a+2*atan(b*x),start=c(a=0.06337,b=0.022344),data=list(x=x,y=y))
Nonlinear regression model
model: y ~ a + 2 * atan(b * x)
data: list(x = x, y = y)
a b
0.07112 0.02231
residual sum-of-squares: 12.36
Number of iterations to convergence: 12
Achieved convergence tolerance: 5.838e-06
this 'nls' function does not use the same underlying distributions for the residual terms but does provide similar coefficients.
Clearly you made your posted problem very simplified in order to make it easier to be understood.
Could you add your real case? (to spice up the question)
Best Answer
Let's say we want to predict median value of house expressed in thousands of $ (
medv
) based on its age, number of rooms (rm
) and the crime rate (crim
) in its neighborhood. The dataset for this is calledBoston
and it is in theMASS
packageOriginal regression model
So we see that the coefficients for those three predictors are -0.211, 8.03 and -0.05. If we want to calculate the price for any house we just use formula:
It means that for example, that for increase in one room the medv price rises for 8.03 thousand dollars.
Now, let's say we measure price in the dollars instead of thousands of dollars. The regression coefficients are totally different.
They are all scaled versions from the ones in the original model, but what if we measure for example, the crime rate per 1000?
Then there is a big change for the rm and age coefficients.
So compare this model and the original one. If we compare the variables based on those coefficients it will seem that the crime rate is much much more important (-211.02 vs only 8.03) than the number of rooms(which is not realistic).
However, in the original model there is quite a different story. Number of rooms seems way more important as we're measuring variables differently.
To asses the relative importance of variables we first need to perform scaling which will make all the variables comparable. It won't matter anymore what unit is used (crim vs crim/1000) for the data, the coefficients will end up the same. We can then compare their values to see which ones are the most important for prediction.
We now apply the same model three times using the differently scaled data sets (Boston, Boston.2, Boston.3) - note the identical coefficient results:
Original regression model
Regression model with prices in dollars instead of thousands
Regression model with crime rate per 1000 people
From this it seems that number of rooms is indeed more important than the crime rate, which again is a common sense :).