Given the dataset cars.txt, we want to formulate a good regression model for the Midrange Price using the variables Horsepower, Length, Luggage, Uturn, Wheelbase, and Width. Both:
- using all possible subsets selection, and
- using an automatic selection technique.
For the first part, we do in R:
cars <- read.table(file=file.choose(), header=TRUE)
names(cars)
#regression
attach(cars)
leap <- leaps(x=cbind(cars$Horsepower, cars$Length, cars$Luggage, cars$Uturn, cars$Wheelbase, cars$Width),
y=cars$MidrangePrice, method=c("r2"), nbest=3)
combine <- cbind(leap$which,leap$size, leap$r2)
n <- length(leap$size)
dimnames(combine) <- list(1:n,c("horsep","length","Luggage","Uturn","Wheelbase","Width","size","r2"))
round(combine, digits=3)
leap.cp <- leaps(x=cbind(cars$Horsepower, cars$Length, cars$Luggage, cars$Uturn, cars$Wheelbase, cars$Width),
y=cars$MidrangePrice, nbest=3)
combine.cp <- cbind(leap.cp$which,leap.cp$size, leap.cp$Cp)
dimnames(combine.cp) <- list(1:n,c("horsep","length","Luggage","Uturn","Wheelbase","Width","size","cp"))
round(combine.cp, digits=3)
plot(leap.cp$size, leap.cp$Cp, ylim=c(1,7))
abline(a=0, b=1)
Am I correct in my interpretation that the most adequate model is one with 4 parameters (the three variables Horsepower, Wheelbase and Width) because it has the lowest Mallows' Cp value?
For the second part, we can choose between the forward, backward or stepwise selection models:
#stepwise selection methods
#forward
slm.foward <- step(lm(cars$MidrangePrice ~1, data=cars), scope=~cars$Horsepower + cars$Length + cars$Luggage + cars$Uturn + cars$Wheelbase + cars$Horsepower+ cars$Width, direction="forward")
#backward
reg.lm1 <- lm(cars$MidrangePrice ~ cars$Horsepower + cars$Length + cars$Luggage + cars$Uturn + cars$Wheelbase + cars$Horsepower + cars$Width)
slm.backward <- step(reg.lm1, direction="backward")
#stepwise
reg.lm1 <- lm(cars$MidrangePrice ~ cars$Horsepower + cars$Length + cars$Luggage + cars$Uturn + cars$Wheelbase + cars$Horsepower + cars$Width)
slm.stepwise <- step(reg.lm1,direction="both")
How do I interpret the results I get from this R code?
Best Answer
For the second part, you must interpret the output as the steps towards your final model.
For example, in the forward case you begin with Start: AIC=377.95 cars$MidrangePrice ~ 1
Your current model is only considering the constant
cars$MidrangePrice ~ 1
.Each row in the table indicates that in case you add that variable (for example, Horsepower), you will get the following results rearding
Sq
RSS
(Residual Sum of Squares) andAIC
(Akaike Information Criterion).In the other cases you must read the results the same way.
Hope this helps :)