I've been using a neural network to make predictions. So my training data is in one .csv file which I read-in and then scale. My test data is in another file that I read-in and is also scaled. However, my test data does not contain an output value column because I am going to be submitting predictions for it to Kaggle to test if the value is correct. (It is part of this Kaggle competition: https://www.kaggle.com/c/carseatsales).
I am not really sure how to scale my prediction if my test data does not have this output column.
Here is how I scaled the data:
train10 = read.csv("Carseats_training.csv")
train10$ShelveLoc = as.numeric(train10$ShelveLoc)
train10$Urban = as.numeric(train10$Urban)
train10$US = as.numeric(train10$US)
maxs <- apply(train10, 2, max)
mins <- apply(train10, 2, min)
index <- sample(1:nrow(train10), round(1*nrow(train10)))
scaled <- as.data.frame(scale(train10, center = mins, scale = maxs - mins))
train100 <- scaled[index,]
test10 = read.csv("Carseats_testing.xls")
test10$ShelveLoc = as.numeric(test10$ShelveLoc)
test10$Urban = as.numeric(test10$Urban)
test10$US = as.numeric(test10$US)
maxss <- apply(test10, 2, max)
minss <- apply(test10, 2, min)
index1 <- sample(1:nrow(test10), round(1*nrow(test10)))
scaleds <- as.data.frame(scale(test10, center = minss, scale = maxss - minss))
test100 <- scaleds[index1,]
This is my neural network:
nn <- neuralnet(Sales ~ CompPrice + Income + Advertising + Population + Price + ShelveLoc
+ Age + Education + Urban + US
, data = train100
, hidden = c(5,3)
, linear.output = T)
I am trying to make a prediction on sales.
pr.nn <- compute(nn, test100[,2:11])
But now I am not really sure how to scale my result.
I would really appreciate any help. I have been stuck on this part for while now.
Best Answer