Please consider offering answers and suggestions to help other students!
And if you fix a problem by following a suggestion here,
it would be great if other interested students could see a short
"Great, fixed it!" followup message.
Please explain what counts as erroneous data. I understand that erroneous data would include something like -1 as a month, but then in another helpOSTS question its implied that we would also be preparing to use data that would give something like a bash error.
Another complaint I have with the assignment is that the data provided is insufficient, as I was able to get my preprocess output to be the exact same as the cleaned TSV file we were given WITHOUT dropping any erroneous lines. I find it unreasonable to request that we drop rows with erroneous data but then not provide us with adequate test data, and I also find the lack of clarity regarding what counts as erroneous data unreasonable. This is far too vague an explanation for an assignment.
It is also unfair that the assignment’s information implies that git is optional (‘If you choose to use Git’) only to have the marking criteria contradict that information entirely by associating a mark for it. ‘Git is the industry standard’ isn’t a sufficient enough reason in my opinion, it shouldn’t have implied that using Git was optional.