It's UWAweek 48


This forum is provided to promote discussion amongst students enrolled in Open Source Tools and Scripting.

Please consider offering answers and suggestions to help other students! And if you fix a problem by following a suggestion here, it would be great if other interested students could see a short "Great, fixed it!"  followup message.

How do I ask a good question?
Displaying selected article
Showing 1 of 564 articles.
Currently 1 other person reading this forum.

 UWA week 17 (1st semester, week 8) ↓
SVG not supported

Login to reply

6:24pm Sat 30th Apr, ANONYMOUS

It looks like some questions that were answered in the forum earlier on in the assignment have been missed by some, and not been posted (or at least, not clearly) in the updated assignment outline. I've done some forum archeology and picked out some relevant posts that hopefully will help some fellow students!


What to submit

The project outline says you should submit a single .zip file, with all scripts inside.

You should probably have the following directory structure:

|	malaria_incidence 
|	common_words
| # optional helper script of your own devising 
| # optional python script written by Michael

If you are using additional scripts, it will be easier to put them in the same directory, like above.

Please note that malaria_incidence and common_words do not have .sh on the end. They are still shell scripts, but have no .sh suffix.

Relevant post

Different OS issues

It's probably best just to use the Docker image from labs to test and run.

Python3 not on Docker

This is only relevant if you plan to use title Make sure you run docker pull as below

Relevant post

Part 1

Parsing countries

You should be able to manage all countries without problem by searching for their 'well known' names. The only two countries that have been discussed are as follows:


Ignore Sudan


Viet Nam and Vietnam are both valid; as a result, there will need to be some intermediary if statements to ensure that if someone passes Vietnam in the command line, it will search for Viet Nam in the .csv

Relevant post

Other examples not covered by Michael

Some examples have not been discussed such as Guinea-Bissau vs. Equatorial Guinea. My advice here would be to look at the -w option flag for grep, which may go some way to helping restric the input

Other examples include UK and USA - my advice would be to double check the file here, as you may notice that the dataset is not complete re: all countries.

The one country that has not been discussed (that, to my mind, is Sudan-Vietnam hybrid) is C'ote D'Ivoire (Ivory Coast)...

Relevant post

Part 2

Definition of a word

Only alphabetics, do not take into account numbers and symbols.

Hence, "bird" is a word; "bird's" is two words ("bird" and "s"). However, "COVID-19" will only be one word ("COVID") as "19" does count as a word per the below restriction of characters being 'alphabetics'

Relevant post

Case sensitivity

Please preserve case sensitivity; that is "You" is different to "you".

Relevant post

Multiple same rank

Pick one (although not sure how this will be verified)

Relevant post

Working with the directory

There are a few posts which have asked directory-based questions

e.g. recursive? e.g. where is the directory?

On Unix, if you pass a directory to a program, it will operate with the directory provided. Additionally, the project spec wants us to count words in files ending with .txt.

So, for example, if I am using our docker image in the /home/stud/perm directory, and wanted to do run the following

./common_words -w the /lab/week5/

This is going to do some ranking on the .txt files in the /lab/week5 directory. Your code should process only the .txt files in that directory (thus ignoring the .csv files). There is no suggestion in the project outline that this operation is recursive (as you are only required to process things ending in .txt).

Error reporting unclear

Of interest is the nature of the reporting Michael wants. For example, this thread and this one implies that we report to the user errors if we have 'out of bounds' issues. This would also extend to Task 1, in which the same logic would state that passing in "Australia" to the data set would turn up errors (as it does not exist).

This appears in contrast to the bash approach, which is unless something goes wrong, do not report anything; however, the exit codes do change.

Examples include

tail -n +3800 Alice_in_Wonderland.txt # exit 0, only has ~3200 lines
grep "Australia" incidenceOfMalaria.txt # exit 1 - see manual

It would be good to have clarification on the circumstances in which Michael wants a bash-approach to errors, or something else.

The University of Western Australia

Computer Science and Software Engineering

CRICOS Code: 00126G
Written by [email protected]
Powered by history
Feedback always welcome - it makes our software better!
Last modified  1:17AM Sep 14 2022
Privacy policy