Solved – Preparing for machine learning exam

machine learningself-study

I am preparing for a machine learning exam and I found a question on the internet ( please see below) that needs to decide which kind of machine learning algorithm are suitable for these data. Despite reading a lot, I am still not able to answer this question properly. I don't know the strategy that I should take to answer this question. By the way do you know any sources that have these kind of examples with solution that can help me to find out the right answer? highly appreciate your help in advance

A financial institution has just hired you to build a system which will decide what car insurance package to offer to different clients. The information recorded about the clients is; gender (Boolean), age group (under 20, 20-35, 35-55, over 55), occupation (one of 20 categories), credit rating (a numerical value between 0 and 20) and number of accidents in the last year, and in the last five years (as reported by the client themselves). They also record the car make, model and year, and the city where the person lives. They have a database of roughly 20000 current clients, and for each they have recorded the plan that was used, as suggested by an expert. For each of the methods below, explain whether it is appropriate to use for this type of data. If your answer is no, explain why. If your answer is yes, explain why, and how you would go about setting up the problem (input processing, output processing, choice of approximator structure etc).

  • Neural network
  • SVM
  • Linear regression
  • Polynomial regression
  • Naive Bayes
  • Decision trees
  • k-nearest neighbor
  • Logistic regression

Best Answer

The answer is that All of the methods can be used for the above problem.

Well, two things should be noted in these kinds of simple problems.

  1. Is it a classification or a regression problem? You might have already guessed that it is a classification problem.

  2. Are there any categorical values in the input features? If yes, does the chosen algorithm work with categorical variables.

The examiner may expect the answer that neural networks, SVM etc. don't work with categorical variables. But in fact you can encode a categorical variable as a series of binary variables. For example if the variable age group takes values {child, young, old}, then you may change this single variable to three binary variables; is_child, is_young and is_old. This way you can use svm or neural network.

Again linear regression looks like an unlikely candidate for a classification problem. But they can be used for classification as well. You don't expect any mentionable performance though.

Related Question