Please consider offering answers and suggestions to help other students!
And if you fix a problem by following a suggestion here,
it would be great if other interested students could see a short
"Great, fixed it!" followup message.
Hi Sir,
Good afternoon. I have some questions about the assignment 2 requirements
In the data cleaning part, do we have to return an error message if the data set does not meet the "clean" condition?
For example:
1. Based on the header (i.e. top) line, make sure that the file is a tab-separated format file
if not: “ The input file is not matching with tab-separated format file”? Or do we have to convert it to a TSV file and continue?
2. Also based on the header line, report any lines that do not have the same number of cells. (Cells are allowed to be empty.)
If the line of the data doesn’t match the condition —> return that specific line with an error message.
If the file has the same number of cells —> continue
3. Remove the column with the header Continent, which is sparsely populated and is not present in one of the files.
If the file contains Continent, remove it? I am not clear on the requirements
4. Ignore the rows that do not represent countries (the country code field is empty)
5. Ignore the rows for years outside those for which we have at least some Cantril data as that is what we will be using. In practice, this means only include years from 2011 to 2021, inclusive.
The output file sent to stdout should have rows with the data in the following order (tab separated):
Thanks for your time for helping me.