Please consider offering answers and suggestions to help other students!
And if you fix a problem by following a suggestion here,
it would be great if other interested students could see a short
"Great, fixed it!" followup message.
I would like to check what I understand from the guideline is correct.
I made a sample data sheet to explain.
* This sheet is not in real and is finished by my_cantril_data_cleaning.
<sample data sheet>
<Country> <Code> <Year> <GDP> <Population> <Homicide> <Life> <Cantril>
Australia AUS 2011 1 1 1 0 1
Australia AUS 2012 2 2 2 2
Australia AUS 2013 3 3 3 3 3
Australia AUS 2014 4 4 4 4
Q1. AS for "Also based on the header line, report any lines that do not have the same number of cells. (Cells are allowed be empty)", even if there is a empty value like AUS-2012-population, does this line go to stdout of my_cantril_data_cleaning, which in the same cell numbers of the header?
Q2. In using best_predictor, the line of 2014 will not go to join to be calculated because the cantril of 2011 is empty, right?
Q3. In using best_perdictor, I believe that the line of Life-2011, 0, goes to be calculated, but the line of Life-2012, empty, will not go to be calculated, right?
Q4. If the three questions above are right, the number of the pair of Life-Cantril, 2, is not sufficient with the guideline. I will not calculate the information of AUS for Life-Cantril correlation. But, Do I have to calculate for the GDP-Cantril of AUS because the number of GDP-Cantril of AUS is more than 3?
Q5. If the four questions above are right, the quantity of information, n, for calculation correlations would be different for each countries and in a case, the each quantities of correlations to calculate a mean correlation of Homicide, GDP, Population, and Life would be different. Is this right?