I am new to R and I am trying to do some predictive modelling on data set which has 16 feature variables and the target value is numeric in R. I am not sure if the steps I am following will help me to fit the model in the best possible way.
- Handling the missing values: The data had a lot of missing values, so I replaced it with the mean of the column. Is this a right way to handle missing values?
- Used Stepwise Regression to select the right set of most predictive variables in a model. Is there any better way to decide the variables than Stepwise regression.
- After deciding the variables, I used glm() function to fit the model.
Can someone please help me to understand the process of predictive modeling in R.
I was actually following the below document to get a sense of predictive modelling.
http://blog.fractalanalytics.com/wp-content/uploads/2013/04/Predictive_Analytics_Methdology_Using_R_v1.0.pdf
Best Answer
This is a very broad and very basic question so I will recommend a very broad and basic book - but one that addresses your question thoroughly:
Predictive Analytics For Dummies by Anasse Bari, Mohamed Chaouchi, Tommy Jung
See especially Chapter 14: Predictive Modeling with R
Hope that helps to get you started...
Addendum
Concerning your comment the following two resources a also well worth a look: