It's UWAweek 48

help2003/help4407

This forum is provided to promote discussion amongst students enrolled in Open Source Tools and Scripting.

Please consider offering answers and suggestions to help other students! And if you fix a problem by following a suggestion here, it would be great if other interested students could see a short "Great, fixed it!"  followup message.

How do I ask a good question?
Displaying the 10 articles in this topic
Showing 10 of 564 articles.
Currently 3 other people reading this forum.


 UWA week 15 (1st semester, week 7) ↓
SVG not supported

Login to reply

👍?
helpful
4:05am Tue 12th Apr, ANONYMOUS

For this one: " On the other hand, if the optional argument -nth followed by an integer N is provided, your program is to report the word that is the Nth most common for the largest number of files, and the number of files for which is that is the case. If there are no options, then assume you are being asked for the word that is most common across the largest number of files, i.e. -nth 1 " what is the meaning of 'the largest number of files'


SVG not supported

Login to reply

👍x1
helpful
9:56am Tue 12th Apr, Ryan B.

Hi Anon,

(I am waiting on a clarification from Michael, but this is my current interpretation will may help get you thinking.)

We have a number of text files in the directory, which will likely have common words between them (e.g. "and", "the", "of"). Each of these files will have its own ranking of the words, and some files will probably have similar rankings; e.g. 10 files each have the 2nd most common word as "the", and 5 files have the 2nd most common word as "and". If we run ./commond_words -nth 2 text_files_directory, we would expect it to print "the", as this is the common word for the largest number of files.

Hope this helps!

Ryan (lab guy).


SVG not supported

Login to reply

👍?
helpful
10:18am Tue 12th Apr, Michael W.

ANONYMOUS wrote:
> For this one: > > " > On the other hand, if the optional argument -nth followed by an integer N is provided, your program is to report the word that is the Nth most common for the largest number of files, and the number of files for which is that is the case. If there are no options, then assume you are being asked for the word that is most common across the largest number of files, i.e. -nth 1 > " > > > what is the meaning of 'the largest number of files'
Hi, For example, across the 10 files in the Gutenberg sample, "the" is the most common word in 9 of them, but "I" is the most common word in the 10th. For -nth 2, the most common across the 10 files is "and", with 5 hits. Makes sense? Cheers MichaelW


SVG not supported

Login to reply

👍?
helpful
10:19pm Tue 12th Apr, Hanlin Z.

Hi, so for this part, is that means you treat 'You' and 'you' to be two different words? Since when I tried not convert [A-Z] to be [a-z] (do not convert You => you), then my result will be same with yours, but if I try to convert it (convert You => you), they are different.


SVG not supported

Login to reply

👍?
helpful
10:30pm Tue 12th Apr, Michael W.

"Hanlin Zhang" <22*4*4*[email protected]*u*e*t*u*a*e*u*a*> wrote:
> Hi, so for this part, is that means you treat 'You' and 'you' to be two different words? > > Since when I tried not convert [A-Z] to be [a-z] (do not convert You => you), then my result will be same with yours, but if I try to convert it (convert You => you), they are different.
Hi Hanlin Great question. Yes, please, preserve the upper or lower case of the letters, but, as before, only sequences of 1 or more letter can make up words. Thanks for pointing this out. I shall change the spec tomorrow. Cheers MichaelW


SVG not supported

Login to reply

👍?
helpful
12:42am Wed 13th Apr, Hanlin Z.

Thank you :)


SVG not supported

Login to reply

👍?
helpful
1:19am Wed 13th Apr, Hanlin Z.

By the way, I have a confused thing with the name of my program for this part, in the question, it said: 'This task is a development of the example that motivated several of the lectures in the unit: finding all the words in text, and from that, the most common word, etc. The program you are to write should be called how_common.' So, I think it should be named as 'how_common', but when we run it, we get an example :'common_words text_files' If I not misunderstand, which name we should use to name our program?


SVG not supported

Login to reply

👍?
helpful
9:28am Wed 13th Apr, Michael W.

"Hanlin Zhang" <22*4*4*[email protected]*u*e*t*u*a*e*u*a*> wrote:
> By the way, I have a confused thing with the name of my program for this part, in the question, it said: > > 'This task is a development of the example that motivated several of the lectures in the unit: finding all the words in text, and from that, the most common word, etc. The program you are to write should be called how_common.' > > So, I think it should be named as 'how_common', but when we run it, we get an example :'common_words text_files' > > If I not misunderstand, which name we should use to name our program?
Hi Hanlin, Fixed. Thanks for letting me know. The correct name is "common_words". Cheers MichaelW


SVG not supported

Login to reply

👍?
helpful
5:53pm Wed 13th Apr, ANONYMOUS

So does it mean that for the input parameters for both parts of the assignments, I could leave the scripts case-sensitive? e.g. Afghanistan = valid; afghanistan = invalid Thanks :)


SVG not supported

Login to reply

👍?
helpful
7:32pm Wed 13th Apr, Ryan B.

The main concern for case sensitivity in Q2 is that word rankings will change depending on if you are case sensitive or case insensitive (you can try this yourself and see what pops up). This is important when comparing your results to ours when we test the solution, so it's good to make sure everyone is on the same page. For Q1, the _output_ of running `common_words` with Afghanistan or afghanistan is going to be the same regardless of case; however, following a 'simpler is better' method may require looking at how country names are formatted in the dataset and make a judgement call about how difficult it would be search the dataset with case-insensitive methods. Hope this helps! Ryan (lab guy).

The University of Western Australia

Computer Science and Software Engineering

CRICOS Code: 00126G
Written by [email protected]
Powered by history
Feedback always welcome - it makes our software better!
Last modified  1:17AM Sep 14 2022
Privacy policy