ANONYMOUS wrote:
> "Please bear in mind that, for the range of years of interest, the Cantril data is none-the-less missing for some countries, wholly or in part. That being the case, to include a correlation for any country there be at least 3 predictor-value, Cantril-value pairs. For example, sample2.tsv includes the data from Afghanistan, UAE, Bahamas and Oman, but only data from the first two countries should be included."
>
> Base on my understanding of this description means we should include only the data from countries which contains the cantril data above or equal to 3. But I am not sure that if all data from one perspective(for example:gdp) is absent in one country should we include this country?(I believe the rows of this country should be deleted otherwise)
>
> Besides one thing I am not certain about is: different predictors for a given country may have varying numbers of data points. For instance, GDP data might be available for seven years while population data might be available for eight. Should we calculate the correlation between each predictor's available data points and the Cantril scores separately, or should we align by using the smallest number of data points across predictors to ensure consistency in the analysis?(Based on my understanding should be the first and after get that correlation then get average after)
Hi,
The fundamental misconception here is that the value needs to be at least 3. That is not correct. What I'm trying to say is that, for a given country, there needs to be at least 3 years with Cantril data. For example, in Sample2.txt, ARE has data for 6 years, so is included, but OMN has data for just 1 year, so is ignored.
Cheers
MichaelW
👨🎨