Please consider offering answers and suggestions to help other students!
And if you fix a problem by following a suggestion here,
it would be great if other interested students could see a short
"Great, fixed it!" followup message.

I would like to check what I understand from the guideline is correct.
I made a sample data sheet to explain.
* This sheet is not in real and is finished by my_cantril_data_cleaning.
<sample data sheet>
<Country> <Code> <Year> <GDP> <Population> <Homicide> <Life> <Cantril>
Australia AUS 2011 1 1 1 0 1
Australia AUS 2012 2 2 2 2
Australia AUS 2013 3 3 3 3 3
Australia AUS 2014 4 4 4 4
Q1. AS for "Also based on the header line, report any lines that do not have the same number of cells. (Cells are allowed be empty)", even if there is a empty value like AUS-2012-population, does this line go to stdout of my_cantril_data_cleaning, which in the same cell numbers of the header?
Q2. In using best_predictor, the line of 2014 will not go to join to be calculated because the cantril of 2011 is empty, right?
Q3. In using best_perdictor, I believe that the line of Life-2011, 0, goes to be calculated, but the line of Life-2012, empty, will not go to be calculated, right?
Q4. If the three questions above are right, the number of the pair of Life-Cantril, 2, is not sufficient with the guideline. I will not calculate the information of AUS for Life-Cantril correlation. But, Do I have to calculate for the GDP-Cantril of AUS because the number of GDP-Cantril of AUS is more than 3?
Q5. If the four questions above are right, the quantity of information, n, for calculation correlations would be different for each countries and in a case, the each quantities of correlations to calculate a mean correlation of Homicide, GDP, Population, and Life would be different. Is this right?

Some unofficial answers here, but these are exactly how I worked on my own assignment.

Q1

According to the clarification email from the professor:

All rows with a country code, between the years 2011 and 2021 are to be reported.

So an empty value like population does not affect whether a line should be outputted or not.

Q2

Yes.

Q3

Yes.

Q4

Again, according to the clarification email from the professor:

That implies that, for a given country and predictor, each Cantril data point corresponds to a a predictor data point, and there are at least 3 of those. This also implies that the number of correlations may be slightly different for each predictor.

So Life-Cantril correlation does not need to be calculated in this case.

But you still need to calculate correlation for GDP-Cantril.

Q5

Yes. For each predictor, n can be different for each entity, for the reason we talked about in Q4.