It's UWAweek 48

help2003/help4407

This forum is provided to promote discussion amongst students enrolled in Open Source Tools and Scripting.

Please consider offering answers and suggestions to help other students! And if you fix a problem by following a suggestion here, it would be great if other interested students could see a short "Great, fixed it!"  followup message.

How do I ask a good question?
Displaying the 2 articles in this topic
Showing 2 of 564 articles.
Currently 4 other people reading this forum.


 UWA week 16 (1st semester, non-teaching week) ↓
SVG not supported

Login to reply

👍?
helpful
2:12pm Fri 22nd Apr, ANONYMOUS

Hi Micheal, I noticed a problem when I was doing the test for my script: Dose the number count as a word? For example, This is a part from AliceInWonderland.txt: """ Release Date: January, 1991 [eBook #11] [Most recently updated: October 12, 2020] Language: English Character set encoding: UTF-8 """ With more thought, here are some questions: 1. Will the number with some symbol like "1991", "#11" or "12," count as a word? 2. Should we count "UTF-8" as two words "UTF" and "8"? If yes, how about the hyphenated word like "well-known" or "dressing-room"? Can we regard these words as two word like "well"+"known" and "dressing"+"room"? 3. How about the money like "$5,000" ? Is it "$"+"5"+"000" or "$"+"5,000" or "$5"+"000"? And the same problem of "%20", chapter sign like "1.E.7." or "1.C.", country name like "U.S.", website like "www.gutenberg.org/license" , etc. Some of these examples involve dot ".", which is also a mark of the end of a sentence. If we treat them as special cases, how do we know how many of these cases there are? 4.About single quote "'", how about "Foundation's", "He's", "I'm", "horses'","horse's", "'No"? Could you please specify in which cases the single quote "'" should be retained? Thanks!


SVG not supported

Login to reply

👍?
helpful
9:14pm Fri 22nd Apr, Michael W.

ANONYMOUS wrote:
> Hi Micheal, > > I noticed a problem when I was doing the test for my script: Dose the number count as a word? > For example, This is a part from AliceInWonderland.txt: > > """ > Release Date: January, 1991 [eBook #11] > [Most recently updated: October 12, 2020] > > Language: English > > Character set encoding: UTF-8 > > """ > With more thought, here are some questions: > > 1. Will the number with some symbol like "1991", "#11" or "12," count as a word? > > 2. Should we count "UTF-8" as two words "UTF" and "8"? If yes, how about the hyphenated word like "well-known" or "dressing-room"? Can we regard these words as two word like "well"+"known" and "dressing"+"room"? > > 3. How about the money like "$5,000" ? Is it "$"+"5"+"000" or "$"+"5,000" or "$5"+"000"? And the same problem of "%20", chapter sign like "1.E.7." or "1.C.", country name like "U.S.", website like "www.gutenberg.org/license" , etc. Some of these examples involve dot ".", which is also a mark of the end of a sentence. If we treat them as special cases, how do we know how many of these cases there are? > > 4.About single quote "'", how about "Foundation's", "He's", "I'm", "horses'","horse's", "'No"? Could you please specify in which cases the single quote "'" should be retained? > > Thanks!
While logically a hyphenated word-pair logically counts as a single word, for our purposes this time round, I'll keep the definition of a word to simply be what I've been using in the lectures: an alphabetic, followed by zero or more other alphabetics. That's it. No numbers, no apostrophes, etc. The truth is that it really doesn't matter as these things are not that common in any case. Cheers MichaelW

The University of Western Australia

Computer Science and Software Engineering

CRICOS Code: 00126G
Written by [email protected]
Powered by history
Feedback always welcome - it makes our software better!
Last modified  1:17AM Sep 14 2022
Privacy policy