My question is to do with
1) how to identify the best kernel function to use (for instance Epanechnikov, Gaussian, triangle etc) for earnings on formal and informal sector using Stata
2) how do I work out the bandwidth which would bring out the best estimation.
Best Answer
Check out the webpage: https://jakevdp.github.io/blog/2013/12/01/kernel-density-estimation/
1) You can test them separately. keep some of your data as validation data, do the KDE without the validation data, look at the likelihood of the validation data in the KDE model. The kernel which gives the highest likelihood is probably the best kernel.
2) You can do cross-validation to get the best parameter. There is a section "Bandwidth Cross-Validation in Scikit-Learn" in the link, which shows you how to do it in a couple of lines.
EDIT: Here is a demonstration of how you would do it (code mainly taken from the link). The code is in Python, which is easy to use for this kind of application: