PLEASE NOTE: the upgrades to this server,, have not yet been completed.
Hopefully the changes will be completed on THURSDAY 8th December.
Web-based programs, such as csmarks, cssubmit, and the help fora, will be unavailable at some time on Thursday 8th.
  It's UWAweek 49


This forum is provided to promote discussion amongst students enrolled in Open Source Tools and Scripting.

Please consider offering answers and suggestions to help other students! And if you fix a problem by following a suggestion here, it would be great if other interested students could see a short "Great, fixed it!"  followup message.

How do I ask a good question?
Displaying the 7 articles in this topic
Showing 7 of 564 articles.
Currently 3 other people reading this forum.

 UWA week 17 (1st semester, week 8) ↓
SVG not supported

Login to reply

6:24pm Sat 30th Apr, ANONYMOUS

It looks like some questions that were answered in the forum earlier on in the assignment have been missed by some, and not been posted (or at least, not clearly) in the updated assignment outline. I've done some forum archeology and picked out some relevant posts that hopefully will help some fellow students!


What to submit

The project outline says you should submit a single .zip file, with all scripts inside.

You should probably have the following directory structure:

|	malaria_incidence 
|	common_words
| # optional helper script of your own devising 
| # optional python script written by Michael

If you are using additional scripts, it will be easier to put them in the same directory, like above.

Please note that malaria_incidence and common_words do not have .sh on the end. They are still shell scripts, but have no .sh suffix.

Relevant post

Different OS issues

It's probably best just to use the Docker image from labs to test and run.

Python3 not on Docker

This is only relevant if you plan to use title Make sure you run docker pull as below

Relevant post

Part 1

Parsing countries

You should be able to manage all countries without problem by searching for their 'well known' names. The only two countries that have been discussed are as follows:


Ignore Sudan


Viet Nam and Vietnam are both valid; as a result, there will need to be some intermediary if statements to ensure that if someone passes Vietnam in the command line, it will search for Viet Nam in the .csv

Relevant post

Other examples not covered by Michael

Some examples have not been discussed such as Guinea-Bissau vs. Equatorial Guinea. My advice here would be to look at the -w option flag for grep, which may go some way to helping restric the input

Other examples include UK and USA - my advice would be to double check the file here, as you may notice that the dataset is not complete re: all countries.

The one country that has not been discussed (that, to my mind, is Sudan-Vietnam hybrid) is C'ote D'Ivoire (Ivory Coast)...

Relevant post

Part 2

Definition of a word

Only alphabetics, do not take into account numbers and symbols.

Hence, "bird" is a word; "bird's" is two words ("bird" and "s"). However, "COVID-19" will only be one word ("COVID") as "19" does count as a word per the below restriction of characters being 'alphabetics'

Relevant post

Case sensitivity

Please preserve case sensitivity; that is "You" is different to "you".

Relevant post

Multiple same rank

Pick one (although not sure how this will be verified)

Relevant post

Working with the directory

There are a few posts which have asked directory-based questions

e.g. recursive? e.g. where is the directory?

On Unix, if you pass a directory to a program, it will operate with the directory provided. Additionally, the project spec wants us to count words in files ending with .txt.

So, for example, if I am using our docker image in the /home/stud/perm directory, and wanted to do run the following

./common_words -w the /lab/week5/

This is going to do some ranking on the .txt files in the /lab/week5 directory. Your code should process only the .txt files in that directory (thus ignoring the .csv files). There is no suggestion in the project outline that this operation is recursive (as you are only required to process things ending in .txt).

Error reporting unclear

Of interest is the nature of the reporting Michael wants. For example, this thread and this one implies that we report to the user errors if we have 'out of bounds' issues. This would also extend to Task 1, in which the same logic would state that passing in "Australia" to the data set would turn up errors (as it does not exist).

This appears in contrast to the bash approach, which is unless something goes wrong, do not report anything; however, the exit codes do change.

Examples include

tail -n +3800 Alice_in_Wonderland.txt # exit 0, only has ~3200 lines
grep "Australia" incidenceOfMalaria.txt # exit 1 - see manual

It would be good to have clarification on the circumstances in which Michael wants a bash-approach to errors, or something else.

SVG not supported

Login to reply

7:33pm Sat 30th Apr, ANONYMOUS

Thanks for the clarification! Want to ask if we can assume the input directory for common_words would always have txt files (excluding directory with suffix .txt)?

SVG not supported

Login to reply

8:00pm Sat 30th Apr, ANONYMOUS

This is really great. Thank you for taking the time to do this.

I think in general, there's just been some confusion regarding expected inputs/outputs for both parts of the project. Michael's done a great job clarifying this in the forums, but I still think there remains many edge cases. This may be acceptable for somebody writing a program for their own research, but makes it difficult for students to be confident that their code can receive full marks.

A better solution in future may be to simplify what the expected inputs can be. Eg. Country names must be an exact match, otherwise print an error. It is very ambiguous what a country's "well-known" names are. What may be well-known to one person, could be foreign to someone else.

SVG not supported

Login to reply

1:03pm Sun 1st May, ANONYMOUS

For the zip file format, shouldn't it also include incedenceOfMalaria.csv. Since it isn't passed as a parameter into the malaria_incidence script, I'd assume the script wouldn't run without the csv in the same directory.

SVG not supported

Login to reply

1:27pm Sun 1st May, ANONYMOUS

Michael has stipulated that he will use the same copy of incedenceOfMalaria.csv, so there is no need to store your own. This also protects against people manually editing their copy of the .csv to make it easier to do the project.

SVG not supported

Login to reply

10:56pm Sun 1st May, ANONYMOUS

Thanks for doing what our lecturer - Michael was supposed to do, really appreciate it.

 UWA week 18 (1st semester, week 9) ↓
SVG not supported

Login to reply

12:20am Mon 2nd May, ANONYMOUS

Thanks! That's what Mike should have done a long time ago! Assignment requirements are too vague and should not require students to ask questions in the forum to figure it out.

The University of Western Australia

Computer Science and Software Engineering

CRICOS Code: 00126G
Written by [email protected]
Powered by history
Feedback always welcome - it makes our software better!
Last modified  1:17AM Sep 14 2022
Privacy policy