Solved – Linear regression on non numeric variables in R

categorical datarregression

I am trying to build a linear regression model for my data which has following variables.

[1] "Productcode"   "Category"    "Month"   "Mode.of.operations" "sales"   "profit.margin"     
 [7] "Name"         "Packaging.content"  "Specifications"     "Unit       "Origin"   

Now as some of my variables are non numeric e.g Origin has values which are names of cities and countries, Mode of operation has values (joint venture, reseller, distributor). In non numeric variable i dont know how to represent it in my linear regression model. One way I can think is to assign numeric values to these variables e.g (Joint venture =1 , Reseller = 2 and Distribution =3) but then it won't be right because it implies Distribution is better or 3 times higher than Joint venture.

Can anyone guide me how to solve this problem in R.

Best Answer

[I am assuming this is an issue with one of your independent variables. If it's in your dependent variable, linear regression is not the way to go]

You are right that assigning numeric variables is the wrong way to go. In linear regression with non-numeric (or categorical) independent variables, you want a coefficient for each category (except a default one). You need the variable to be a factor. You can either let R do this for you, by just adding the variable as-is to the model, or convert it to a factor yourself. That way, you can set which mode of operation is the default.

Related Question