My data is 1,785,000 records with 271 features. I'm trying to reduce number of features used to build the model.
Q1. while exploring the data I found that some features are almost all missing data, like only 25 records has value for this feature and the others records has missing values, so I thought that is not informative enough and it's better to eleminate those features, am I right? and if I am right, for what level I can do that, I mean if 90%, 80%, etc.. of each feature are missing values, when I can decide to get rid of these features? (taking in consideration that it is the dependent variable is N/Y and only %1.157 of the whole data is belonging to Y).
Q2. for each indivisual in the dataset, there are 64 trait_type listed, where each one can take one of the values [1 or 3 or 5]. my question is: if some trait-type take only value [5] or missing dat for all the record, does it have any value or again we can eliminate that feature?
Thank you
Best Answer