I have a dataset which has 131 features. my goal is to estimate a value based on these features using deep learning(regression problem). However, 5 of my features only has 0 or 1 values, i want to know how can i implement these categorical variables in my model for regression problem?
should i remove them ? or what?
I really appreciate any help
Best Answer
In general, binary data types are used to represent membership in particular categories. Suppose you have some data in a row-column format, so that the columns are features and rows are observations. Perhaps you're interested in the relationship between height and gender. So your data look like
And so on. Clearly, there's not a great numerical representation of "male" and "female." But another way to look at the problem is as a question of membership, to answer the question "is this person male?" or, equivalently, "is this person female?" In this way, we can take the categories "male" and "female" and translate them into binary, numerical quantities (conventionally, $1$ and $0$). Whichever we choose to treat as 1 or 0 is irrelevant from a mathematical standpoint. Importantly, it's standard practice to expand a variable with $k$ categories into $k-1$ binary columns. This is because you don't want to make your columns linearly dependent with an intercept column of all $1$s, and make it impossible to uniquely estimate these quantities. I don't know what the particular details of your so-called "deep learning" regression are, but it probably has something like an intercept.
You've asked @Sheep
But this is a fundamentally unanswerable question without actually doing the analysis. Run the regression and find out!
Maybe. If the two quantities are completely unrelated -- for example, political party in power in a given territory and number of hurricanes over the Atlantic ocean -- then it probably wouldn't make sense. Instead of looking at your research as a purely rote, quantitative exercise, I would encourage you to think critically about what your data represent, and what the underlying causal mechanism is. Is there a biological process at work? Are people acting according to their self-interest? What do we know about the climate that casues hurricanes?