I am applying Neural network and SVM to predict buy-hold – sell signals. I have trained nn and SVM in R. I used nnet function to train NN and svm to train SVM. I provided 20,000 data points to train and 2000 data points to test. training data set contain list of 10-15 technical indciators and buy-sell-hold signals. The problem I am having is it is not predicting the buy -sell – hold signals with good accuracy on the testing data. I have used sigmoid function in nnet and radial function in SVM. Any suggestion, how to improve the accuracy of the prediction?
Solved – Train NN or SVM to classify stock signals
machine learningrsvm
Related Solutions
Max Kuhn's caret Manual - Model Building is a great starting point.
I would think of the validation stage as occurring within the caret train() call, since it is choosing your hyperparameters of decay and size via bootstrapping or some other approach that you can specify via the trControl parameter. I call the data set I use for characterizing the error of the final chosen model my test set. Since caret handles selection of hyperparameters for you, you just need a training set and a test set.
You can use the createDataPartition() function in caret to split your data set into training and test sets. I tested this using the Prestige data set from the car package, which has information about income as related to level of education and occupational prestige:
library(car)
library(caret)
trainIndex <- createDataPartition(Prestige$income, p=.7, list=F)
prestige.train <- Prestige[trainIndex, ]
prestige.test <- Prestige[-trainIndex, ]
The createDataPartition() function seems a little misnamed because it doesn't create the partition for you, but rather provides a vector of indexes that you then can use to construct training and test sets. It's pretty easy to do this yourself in R using sample() but one thing createDataPartition() apparently does do is sample from within factor levels. Moreover, if your outcome is categorical, the distribution is maintained across the data partitions. It's not relevant in this case, however, since your outcome is continuous.
Now you can train your model on the training set:
my.grid <- expand.grid(.decay = c(0.5, 0.1), .size = c(5, 6, 7))
prestige.fit <- train(income ~ prestige + education, data = prestige.train,
method = "nnet", maxit = 1000, tuneGrid = my.grid, trace = F, linout = 1)
Aside: I had to add the linout parameter to get nnet to work with a regression (vs. classification) problem. Otherwise I got all 1s as predicted values from the model.
You can then call predict on the fit object using the test data set and calculate RMSE from the results:
prestige.predict <- predict(prestige.fit, newdata = prestige.test)
prestige.rmse <- sqrt(mean((prestige.predict - prestige.test$income)^2))
My advice would be to not do this. The theoretical advantages of the SVM that avoid over-fitting apply only to the determination of the lagrange multipliers (the parameters of the model). As soon as you start performing feature selection, those advantages are essentially lost, as there is little theory that covers model selection or feature selection, and you are highly likely to over-fit the feature selection criterion, especially if you search really hard using a GA. If feature selection is important, I would use something like LASSO, LARS or Elastic net, where the feature selection arises via reguarisation, where the feature selection is more constrained, so there are fewer effective degrees of freedom, and less over-fitting.
Note a key advantage of the SVM is that is is an approximate implementation of a generalisation bound which is independent of the dimensionality of the feature space, which suggests that feature selection perhaps shouldn't necessarily be expected to improve performance, and if there is a defficiency in the selection prcess (e.g. over-fitting the selection criterion) it may well make things worse!
Best Answer
I recommend reading "A Practical Guide to Support Vector Classification" by Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin
In the work I've done much lower values of C work better - in the range of $10^{-4}$ to $10^{-2}$. 1-100 is a pretty high value of C, at least for the data I've worked with, which means you are not allowing much 'slack' and so its not surprising that you are finding your model over-fits the data. I would recommend trying with much smaller values of C and in orders of magnitude increments. Another alternative is to try nu-SVM rather than C-SVM. The parameter nu ranges from 0 to 1 (.1 to .8 in practice) and is much more intuitive: .1 means a small proportion of your data points are support vectors (and therefore you have a narrow margin and little slack), .8 means a very large percent are support vectors (and therefore a wide margin and a good deal of slack).