Faculty of Engineering and Mathematical Sciences 
Not logged in (login)

help3401


This forum is provided to promote discussion amongst students enrolled in Data Warehousing (CITS3401).
 
Options:
RSS cloud
Jump to:

features selection

4 of 259 articles shown, currently no other people reading this forum.
photo
From: ANONYMOUS
Date: Wed 27th May, 3:24pm
Actions: 
        Login-to-reply

 

Hi i have few questions
1.After plot the data it shows that working long time would be more likely have 
higher income. But in my feature selection the rank of working hours is very behind 
like 8 in 10 features. Would it supports to be like that?
2.After encoding one-hot of the categorical features, it now has 35 attributes in the 
data set. So in this case what would be the proper number of features selected into 
the model, based on the IG ranking?
cheers

features selection

photo
From: Zeyi W.
Date: Wed 27th May, 5:10pm
Actions: 
        Login-to-reply

 

1. This is normal. The data may not totally reflect our intuition. The task of attribute 
selection based on "your understanding" v.s. that based on IG is for you to get something 
like this, so that you would hopefully have a deeper impression of knowledge learnt here.

2. This is the "hyper-parameter selection" problem, similar to selecting min_support. It is 
up to you. You can choose whatever value you are happy with.

features selection

photo
From: ANONYMOUS  O.P.
Date: Wed 27th May, 11:23pm
Actions: 
        Login-to-reply

 

Thanks zeyi. Also when doing the lift, should the right-hand side attributes which are not only 
contain the income also consider as top rules? like married=single => country=US/income=low
cheers

features selection

photo
From: Zeyi W.
Date: Thu 28th May, 11:59am
Actions: 
        Login-to-reply

 

You should have the rules with only "income" on the right-hand side. Please try rules with smaller 
lift/conf.
This Page


Program written by: [email protected]
Feedback welcome
Last modified:  5:31am Aug 04 2020