Hi i have few questions
1.After plot the data it shows that working long time would be more likely have
higher income. But in my feature selection the rank of working hours is very behind
like 8 in 10 features. Would it supports to be like that?
2.After encoding one-hot of the categorical features, it now has 35 attributes in the
data set. So in this case what would be the proper number of features selected into
the model, based on the IG ranking?
1. This is normal. The data may not totally reflect our intuition. The task of attribute
selection based on "your understanding" v.s. that based on IG is for you to get something
like this, so that you would hopefully have a deeper impression of knowledge learnt here.
2. This is the "hyper-parameter selection" problem, similar to selecting min_support. It is
up to you. You can choose whatever value you are happy with.