https://secure.csse.uwa.edu.au/run/help3401?p=np&rss=y
The help3401 RSS feedThe help3401 RSS feedhelp3401https://blogs.law.harvard.edu/tech/rssThe University of Western AustraliaFri, 29 May 2020 18:48:43 +0800Fri, 29 May 2020 18:48:43 +0800PCA using discrete values
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=238
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=238Fri, 29 May 2020 18:48:43 +0800ANONYMOUShi
would it be acceptable to disregard the categorical data when conduction the PCA.
the results of using the categorical data are good, and are extremely accureate,
but im sure thats do to overfitting and not really a true representation of a
trained model. so would it be acceptale to remove these categorical values and
conduct the PCA with just a selected set of continuous attributes.
thank you
Re: Data Reduction - compare performance
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=237
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=237Fri, 29 May 2020 17:22:08 +0800"Zeyi Wen" <zeyi.wen@uwa.edu.au>You can compare them based on accuracy, F1, etc., or compare them on data set size, etc. Data
reduction usually leads to lower accuracy, but sometimes may lead to higher accuracy.
You can use J48 or other models of your choice.
Data Reduction - compare performance
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=236
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=236Fri, 29 May 2020 12:02:37 +0800ANONYMOUSI am not sure of what our comparison between the 'reduced data' and 'original data' is
supposed to show.
Is it that performing data reduction yields a similar accuracy with reduced time and memory
requirements? Is accuracy supposed to improve?
Also, should we use the J48 Decision Tree to Compare the performance of the two models?
Re: Tree view
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=235
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=235Fri, 29 May 2020 11:31:38 +0800"Zeyi Wen" <zeyi.wen@uwa.edu.au>You can say the tree is trained with minNumOjb=100 in the report, and interpret the tree
(or analyse something you want).
Re: Can we choose whether we want to use confidence or lift?
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=234
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=234Fri, 29 May 2020 11:29:26 +0800"Zeyi Wen" <zeyi.wen@uwa.edu.au>Not the minsupport, but the support for the itemsets (which generate the rules). You may need to do some
searches to find out.
You should have only income bracket on the right-hand side.
Re: Tree view
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=233
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=233Fri, 29 May 2020 00:50:49 +0800ANONYMOUS
Thank you! So what we should analysis is the case that under minnumObj 100?
Re: Can we choose whether we want to use confidence or lift?
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=232
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=232Thu, 28 May 2020 22:53:27 +0800ANONYMOUS
Thank you for your help Zeyi.
Having taken approach (2), should we just use the minsupport that we get from Apriori to justify
interestingness when confidence is 100%? As well as, I find if I use lift, it tends to give me more
than one attribute on the right hand side but still includes income bracket >50K, would this still
be ok to interpret as patterns for income bracket?
Re: Data Reduction: Sampling and feature reduction
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=231
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=231Thu, 28 May 2020 19:49:43 +0800"Zeyi Wen" <zeyi.wen@uwa.edu.au>The first one. You should perform sampling+feature reduction on the data set.
Re: Tree view
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=230
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=230Thu, 28 May 2020 19:47:52 +0800"Zeyi Wen" <zeyi.wen@uwa.edu.au>Yes. You can also try REPTree instead of J48, where you can set "max_depth", if
REPTree can meet your need.
Data Reduction: Sampling and feature reduction
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=229
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=229Thu, 28 May 2020 15:39:28 +0800ANONYMOUSHi Zeyi
For the data reduction question it asks use to perform numerosity "reduction using
sampling and feature reduction."
Does this mean we need to perform sampling and feature reduction on one version of the
original data, and compare this to the original data.
Or do we perform sampling reduction on original data, and compare it with feature
reduced original data?
thanks
Tree view
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=228
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=228Thu, 28 May 2020 15:28:17 +0800ANONYMOUSHi Zeyi,
My tree is too large to view clearly, even tried to fit into the screen, can I
modify the minNumObj to 50 or larger like 100 so that the tree is readable?
Re: Data Reduction
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=227
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=227Thu, 28 May 2020 15:10:37 +0800"Zeyi Wen" <zeyi.wen@uwa.edu.au>Yes. You may treat binary attributes as numeric ones as well. It is up to you... You need to
explain in the report.
Re: Data Reduction
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=226
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=226Thu, 28 May 2020 15:04:46 +0800ANONYMOUSSo, if I choose PCA, I just do it on the numeric attributes in my dataset, right? Because
PCA only works on numeric attributes.
Re: Training model for data reduction
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=225
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=225Thu, 28 May 2020 14:58:29 +0800"Zeyi Wen" <zeyi.wen@uwa.edu.au>You can choose any model you like.
Both numerosity and feature reduction should be performed on the same dataset.
Re: Can we choose whether we want to use confidence or lift?
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=224
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=224Thu, 28 May 2020 14:57:33 +0800"Zeyi Wen" <zeyi.wen@uwa.edu.au>You can use confidence instead of lift.
For the second problem, you can (1) set a very small confidence threshold and a very large
number of rules (e.g. 1000), and then search for the rules you like; (2) construct a data set
only having income>50k and do the rule mining, where you will have 100% confidence and you can
use support to rank the rules; (3) balance the data sets, so that both income>50k and
income<=50 have similar number of instances, and perform the rule mining.
Any of the above three approaches is fine.
Training model for data reduction
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=223
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=223Thu, 28 May 2020 14:21:16 +0800ANONYMOUSFor part 4, what type of model are we suppose to train? A decision tree?
Also is the numerosity reduction and feature reduction done on the same data or are
they done separately on two datasets?
Can we choose whether we want to use confidence or lift?
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=222
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=222Thu, 28 May 2020 12:33:42 +0800ANONYMOUSHey,
When I was trying to use lift to find rules/patterns for associated attributes, it
wouldn’t have income bracket on the right hand side. However if I used car = True and
tried to find the confidence, I would see income bracket on the right hand side. My
question is that for step 1, ‘association rule mining’, can we just use confidence to
justify the interestingness of the rule and why we chose it? Or is there another way
to get lift to work?
As well as, when I used car = True and tried to find the confidence, I would see
income bracket on the right hand side, but it only happens to be <=50K. Should I
change the min lower bound and upper bound or would there be a way to combat this?
Thanks
Re: Comparing decision tree models
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=221
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=221Thu, 28 May 2020 12:11:05 +0800"Zeyi Wen" <zeyi.wen@uwa.edu.au>I would recommend comparing both if you have time. Otherwise, you can compare two tree
models: one with attribute selection based on your understanding and the other with
attribute selection based on IG.
Re: Question regarding the weighting of each assignment
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=220
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=220Thu, 28 May 2020 12:08:03 +0800"Zeyi Wen" <zeyi.wen@uwa.edu.au>The midsem has been reduced to 10% as per the announcement before the midsem test.
Re: question on using fnlwgt
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=219
https://secure.csse.uwa.edu.au/run/help3401?p=np&a=219Thu, 28 May 2020 12:06:17 +0800"Zeyi Wen" <zeyi.wen@uwa.edu.au>You can remove fnlwgt if you don't know how to make use of it. However, using fnlwgt may lead to interesting findings. You may need to perform some preprocessing (e.g. normalisation) on fnlwgt before use it.