It's UWAweek 48


This forum is provided to promote discussion amongst students enrolled in Open Source Tools and Scripting.

Please consider offering answers and suggestions to help other students! And if you fix a problem by following a suggestion here, it would be great if other interested students could see a short "Great, fixed it!"  followup message.

How do I ask a good question?
Displaying the 2 articles in this topic
Showing 2 of 564 articles.
Currently 4 other people reading this forum.

 UWA week 20 (1st semester, week 11) ↓
SVG not supported

Login to reply

5:10pm Sat 21st May, ANONYMOUS

Hi Michael, While trying different samples given, I could notice that there are many hyphened words i.e. there are words having single hyphen, double hyphens, triple hyphens, quad hyphens, quintuple hyphens and even 6. According to that, I am worried whether the compound-words count provided by you for profile Hucklebery_Finn.txt is correct. I am not sure how many hyphened should we calculate to get the best result. Please find the attached screenshot. Similarly for profile Hucklebery_Finn.txt For words like and, or ---> Total and/or count - contraction_count - compoundword_count For words like that, yet ---> Total that/yet count - contraction_count - possessive_Count - compoundword_count For word like as ---> Total as count is not deducted with contraction_count, possessive_Count and compoundword_count For other conjunctions ---> Total conjunction count - contraction_count - compoundword_count - compoundword_count on - possessive_Count It is different for some words. Can you please help me if I am missing out on something ? After trying different samples I have arrived at a conclusion that there might be some differences in count provided with the actual count. There is something or the other different in every sample text provided. In some places the contractions are counted when they just have words like won't, they'll but in some instances the contractions are not counted for words like 'bout. Everything is different in each and every sample provided which is making things more difficult. I am not sure which one to use to get an appropriate result. I am still unclear what all should we consider for profile creation as we could loose marks because of lack of information provided.

This article has 1 attachment:


SVG not supported

Login to reply

5:27pm Sat 21st May, Michael W.

Hi, The only things that can be regarded as compound words are <simple word>-<simple word) or <simple word>'-<simple word> (which is both a contraction AND a compound word). --, ---, ----, are all to be converted to blank. BTW, it is for this reason that I'm allowing a margin of 10% on the longer text comparisons, just in case there are more edge cases buried in the longer texts. Cheers MichaelW

The University of Western Australia

Computer Science and Software Engineering

CRICOS Code: 00126G
Written by [email protected]
Powered by history
Feedback always welcome - it makes our software better!
Last modified  1:17AM Sep 14 2022
Privacy policy