It's UWAweek 48

help2003/help4407

This forum is provided to promote discussion amongst students enrolled in Open Source Tools and Scripting.

Please consider offering answers and suggestions to help other students! And if you fix a problem by following a suggestion here, it would be great if other interested students could see a short "Great, fixed it!"  followup message.

How do I ask a good question?
Displaying the 6 articles in this topic
Showing 6 of 564 articles.
Currently 2 other people reading this forum.


 UWA week 17 (1st semester, week 8) ↓
SVG not supported

Login to reply

👍?
helpful

Hi Michael, Should the programme common_words be case sensitiveļ¼Ÿ like 'and' 'And' 'AND' will be regarded as three words or one word when we count occurrency?


SVG not supported

Login to reply

👍?
helpful
12:16pm Sat 30th Apr, Michael W.

Hi, Given that these are very common words in real texts, they are only capitalised at the starts of sentences. And rarely appears at the start of a sentence, for example. In other words, it doesn't much matter. FWIW, I don't change the capitalisation of words in the texts. Cheers MichaelW


SVG not supported

Login to reply

👍?
helpful

Hi Michael, I am getting 911 hits for 'the' and 909 hits for 'I'for ADollsHouse.txt, which makes 'the' as -nth 1 (most popular word). However, in the testing sample, it says: % ./common_words -w I text_files The most significant rank for the word I is 1 in file ADollsHouse.txt Since the count numbers of two words are extremely close, some trivial/ exceptional cases will make great impact on the result.I think it is due to how we define a word, like is 'The','the' or 'THE' or ('the' with some special characters)the same? Apart from that, let's say if I can't match 100% of your rules to construct/define a word and return a slightly different result, will I still get partial marks based on my logic flow of codes? Thanks in advance.


SVG not supported

Login to reply

👍?
helpful

Yes, this is what I want to ask.


SVG not supported

Login to reply

👍?
helpful

This is true. The outputs to the tests are very sensitive to how you parse the text. For instance, I'm not sure if Michael has allowed for apostrophes in words, in his example (he's, she's, don't, won't), as I'm getting Alice as the 11th, not 12th, most common word in Alice in Wonderland


SVG not supported

Login to reply

👍?
helpful

ANONYMOUS wrote:

Hi Michael,

I am getting 911 hits for 'the' and 909 hits for 'I'for ADollsHouse.txt, which makes 'the' as -nth 1 (most popular word). However, in the testing sample, it says:

% ./common_words -w I text_files The most significant rank for the word I is 1 in file ADollsHouse.txt

I get the same as Michael ("I" occurs 950 times, "the 920") if I check just that file using grep and some additional flags (e.g. -w. Perhaps your search strategy is too restrictive?

Since the count numbers of two words are extremely close, some trivial/ exceptional cases will make great impact on the result.I think it is due to how we define a word, like is 'The','the' or 'THE' or ('the' with some special characters)the same? Apart from that, let's say if I can't match 100% of your rules to construct/define a word and return a slightly different result, will I still get partial marks based on my logic flow of codes?

Some of the word definitions have been clarified (e.g. preserve cases, so "The" is different to "the"). Hope this helps!

The University of Western Australia

Computer Science and Software Engineering

CRICOS Code: 00126G
Written by [email protected]
Powered by history
Feedback always welcome - it makes our software better!
Last modified  1:17AM Sep 14 2022
Privacy policy