Solved – Machine learning method to determine continuous values from discrete and continuous parameters

discrete datamachine learning

I watched online courses about multivariable linear regression which addresses the problem of determining values from numeric inputs. Like, predict prices for houses based on age, size, number of floors…

I also learned about things like logistic regression which concerns classification problems based on numerical inputs. Like, predict if an email is spam or not spam.

But I'm having trouble to find informations about predicting continuous values (like a price) based on a mix of discrete and continuous parameters.

For example, if I have several models of laptops classified with these categories:

[discrete] color: gray, blue, black...
[discrete] backlit_keyboard: yes, no
[discrete] material: plastic, aluminium
[continuous] weight: (numeric)
[discrete] brand: apple, sony, lenovo...

Now, assuming that the price will be a function of all these variables, and that I have a large training set of actual prices for each described object, I want to determine this function so that I'll be able to tell an estimated price for an object which has no previous example in the training set.

Which machine learning methods could/should be used to fit such a purpose?

Best Answer

One standard thing to do is to use one-hot encoding, and then run any regression algorithm you'd like (e.g. a variant of linear regression, or maybe kernel ridge regression). That is, the first dimension can be color_is_gray (which would be 1 if the color is gray, and 0 if not); the second color_is_blue, and so on. Then concatenate features for the other attributes.

If you have some notion of distance between the discrete attributes, you can instead perform multidimensional scaling and use the features it obtains.