Solved – How to visualize (make plot) of regression output against categorical input variable

categorical datadata visualizationmatplotlibmultiple regressionregression

I am doing linear regression with multiple variables. In my data I have n = 143 features and m = 13000 training examples. Some of my features are continuous (ordinal) variables (area, year, number of rooms). But I also have categorical variables (district, color, type). For now I visualized some of my feautures against predicted price. For example here is the plot of area against predicted price:
enter image description here

Since area is continuous ordinal variable I had no troubles visualizing the data. But now I wanted to somehow visualize dependency of my categorical variables (such as district) on predicted price.
For categorical variables I used one-hot (dummy) encoding.

For example that kind of data:
enter image description here

turned to this format:
enter image description here

If I were using ordinal encoding for districts this way:

DistrictA - 1
DistrictB - 2
DistrictC - 3
DistrictD - 4
DistrictE - 5

I would plot this values against predicted price pretty easy by putting 1-5 to X axis and price to Y axis.

But I used dummy coding and now I do not know how can I show (visualize) dependency between price and categorical variable 'District' represented as series of zeros and ones.

How can I make a plot showing a regression line of districts against predicted price in case of using dummy coding?

Best Answer

You could use boxplots to show the relationship between a categorical variable and a continuous output. In your case, you could make boxplots of the predicted price for the 5 districts.

Here is an example figure of what I mean

enter image description here

Of course how you create the box plots may depend on whether or not the districts are coded as 0's and 1's depending on what software you are using (I don't know what software you are using).

However, if the 0-1 coding is really what the problem is, I can't imagine it to be that hard to create another variable that is coded 1-5 based on the 0-1 coding.