Hi Boya,
Good questions. I'll reply to each below it.
"Boya Zhang" <24*2*2*
7@s*u*e*t*u*a*e*u*a*> wrote:
> Hi Professor,
>
> I have some questions regarding the assignment:
>
> 1. For the requirement to report any lines that do not have the same number of cells (with cells allowed to be empty), your previous answer mentioned printing an error message to stderr. Do we also need to identify the line number and specify which file it belongs to?
The trick with all of these is to place yourself in the role of a user of the system. What will they expect? What information will be useful?
>
> 2. Regarding the output for script 1, am I allowed to sort the data? I noticed that in sample2, the data is not sorted alphabetically by country. I assume that if I use a join operation, the file needs to be sorted first. Can we sort the data based on the country?
As far as I can see, the country data (ie data that has a country code) is sorted by country code, and then by year.
>
> 3. Also, how should we determine whether the output from script 1 is successful? Is it based on the total number of cleaned data records or some other criteria?
If the data was clean to start with, the number of "cleaned" records should be zero. The only criterion is whether the cleaned data is actually clean, and if actually cleaned, that you've not also disposed of perfectly correct data.
Cheers
MichaelW
👨🎨
>
> Thank you for your guidance.