It's UWAweek 24 (1st semester, 2nd exam week)

helpOSTS

This forum is provided to promote discussion amongst students enrolled in Open Source Tools and Scripting.

Please consider offering answers and suggestions to help other students! And if you fix a problem by following a suggestion here, it would be great if other interested students could see a short "Great, fixed it!"  followup message.

How do I ask a good question?
Displaying selected article
Showing 1 of 439 articles.
Currently 2 other people reading this forum.


 UWA week 20 (1st semester, week 11) ↓
SVG not supported

Login to reply

👍x1
helpful

Hi, I'll respond to each bit below it, ANONYMOUS wrote:
> Hi Sir > > Good afternoon. I have some questions about the assignment 2 requirements > > In the data cleaning part, do we have to return an error message if the data set does not meet the "clean" condition?
As always,that depends on whether the problem is local/fixable or not fixable.
> > For example: > 1. Based on the header (i.e. top) line, make sure that the file is a tab-separated format file > > if not: “ The input file is not matching with tab-separated format file”? Or do we have to convert it to a TSV file and continue?
A .csv file instead of the expected .tsv (or some other separator) is not readily fixable (if done properly), so error message and exit is reasonable.
> > 2. Also based on the header line, report any lines that do not have the same number of cells. (Cells are allowed to be empty.) > > If the line of the data doesn’t match the condition —> return that specific line with an error message.
Yes, report the line on stderr, and continue processing (but don't print the line)
> If the file has the same number of cells —> continue
Yes
> > 3. Remove the column with the header Continent, which is sparsely populated and is not present in one of the files. > If the file contains Continent, remove it? I am not clear on the requirements
Yes, remove the column called Continent, if present in the file
>
> 4. Ignore the rows that do not represent countries (the country code field is empty)
Yes
> 5. Ignore the rows for years outside those for which we have at least some Cantril data as that is what we will be using. In practice, this means only include years from 2011 to 2021, inclusive.
Yes.
> The output file sent to stdout should have rows with the data in the following order (tab separated): > > Thanks for your time for helping me.
Cheers MichaelW 👨‍🎨

The University of Western Australia

Computer Science and Software Engineering

CRICOS Code: 00126G
Written by [email protected]
Powered by history
Feedback always welcome - it makes our software better!
Last modified  5:07AM Sep 06 2023
Privacy policy