Solved – How to generate Feature vector from multiple tables of a database for classification

classificationmachine learning

Ive got 6 tables and each of them have multiple rows. The tables are :-

candidate_general_info, previous_employment_history,
previous_health_issues, present_health_issues, family_health_history, present_claim_information

based on attributes whether to approve the claim or reject it.

  1. candidate_general_info : This is a single row (age,sex,country etc)
  2. previous_employment_history : This can be zero or more rows (person can be unemployed, child etc)
  3. previous_health_issues : This can be zero or more rows
  4. present_health_issues : This will be one or more (Someone had a health issue & thats what he is claiming for)
  5. family_health_history : This will be zero or more
  6. present_claim_information : This will be one or more (Someone had a health issue & thats what he is claiming for). Information about disease, claim amount, number of days hospitalized & whether or not the claim was approved.

I have 255,000 such records. Now the problem is that how do I even create the feature from this dataset. In case of differentiating between cats & dogs one can take similar sized images (lets say 100×200 pixel) which when flattened gives you a feature vector. But I cant even comprehend how to map the feature vector in my case with all these tables.

The problem is that each of the candidate has multiple claims. For example person_1 would have filed 10 claims, 6 of which approved & 4 rejected. Similarly person_2 would have filed 50, 40 of which approved and 10 rejected. How will I deal with this temporal dimension of the data ?. For comparison it would have been simpler if each person had one and only one claim and its status would have been either approved/rejected. But my data is pretty complex.

Best Answer

You could try with automated feature engineering. Featuretools is a framework to perform automated feature engineering. It excels at transforming temporal and relational datasets into feature matrices for machine learning.

https://docs.featuretools.com/

Related Question