##
## The downloaded binary packages are in
## /var/folders/nw/7l08mg653w34vzd4vnnvy8xc0000gn/T//RtmpR9msld/downloaded_packages
## # A tibble: 6 × 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
##
## The downloaded binary packages are in
## /var/folders/nw/7l08mg653w34vzd4vnnvy8xc0000gn/T//RtmpR9msld/downloaded_packages
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
##
## The downloaded binary packages are in
## /var/folders/nw/7l08mg653w34vzd4vnnvy8xc0000gn/T//RtmpR9msld/downloaded_packages
##
## The downloaded binary packages are in
## /var/folders/nw/7l08mg653w34vzd4vnnvy8xc0000gn/T//RtmpR9msld/downloaded_packages
## Warning: package 'XQuartz' is not available for this version of R
##
## A version of this package for your version of R might be available elsewhere,
## see the ideas at
## https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages
In this short report we are going to look into whether there is any sort of correlation between Italian life expectancy and population growth throughout the years. For this, we have a dataset Gapminder. The given dataset contains data on life expectancy, population size and GDP per capita by country and year. Time frame of the given data is every 5 years, it starts from 1952 and ends with 2007. Initial dataset is formed by data frame with 1704 rows and 6 variables out of which only 2 will be analysed: population and life expectancy.
The life expectancy and the population both are a numeric data types.
Research question for this paper is: Does the growth in population have a positive impact on life expectancy of the people?
## year population life_expectancy
## 1 1952 47666000 65.940
## 2 1957 49182000 67.810
## 3 1962 50843200 69.240
## 4 1967 52667100 71.060
## 5 1972 54365564 72.190
## 6 1977 56059245 73.480
## 7 1982 56535636 74.980
## 8 1987 56729703 76.420
## 9 1992 56840847 77.440
## 10 1997 57479469 78.820
## 11 2002 57926999 80.240
## 12 2007 58147733 80.546
## year population life_expectancy
## Min. :1952 Min. :47666000 Min. :65.94
## 1st Qu.:1966 1st Qu.:52211125 1st Qu.:70.61
## Median :1980 Median :56297440 Median :74.23
## Mean :1980 Mean :54536958 Mean :74.01
## 3rd Qu.:1993 3rd Qu.:57000502 3rd Qu.:77.78
## Max. :2007 Max. :58147733 Max. :80.55
In order to look into Italian population and life expectancy specifically, it was selected and filtered out from the whole data set. A new table with relevant data was created. This new table includes 12 objects of 2 variables.
From the dataset we can see that minimum life expectancy for Italy has been 65.94 years and maximum 80.55 years. Median for life expectancy is 74.23 years. Life expectancy was the lowest in 1952 and highest in 2007.
What goes for the population in the year range from 1952 to 2007, we can say that the minimum number of people in the country was 47666000 and maximum 58147733. Minimum number of population was in 1952 and maximum in 2007.
In order to understand whether there is a correlation between life expectancy and the population a scatter plot was created in order to understand the relationship between two variables.
## `geom_smooth()` using formula = 'y ~ x'
The plot shows a linear relationship between population and life expectancy.
Since the data showed normal distributions from a visual view on histograms a Pearson’s correlation test was done.
## [1] 0.96
The Pearson’s correlation coeficent for life expectancy and population is 0.96. The number is positive therefore there is a very strong positive correlation between the growth of the population and life expectancy.
In order to proceed with the hypotesis testing, the research hypotesis were stated:
H0 There is NO relationship between life expectancy and population
H1 There IS a relationship between life expectancy and population
The selected statistical test for this was Pearson’s r test.
## [1] 0.96
##
## Pearson's product-moment correlation
##
## data: life_expectancy and population
## t = 10.664, df = 10, p-value = 8.79e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8556059 0.9886580
## sample estimates:
## cor
## 0.958736
The Pearson’s correlation test was done on the variables of the population and life expectancy and the p-value is 0.000000879 which is less than the significance level alpha = 0.05. Therefore, we can conclude that there is a significant very strong positive correlation between the population number and life expectancy, r (10) = .96, p < .00001.
Based on the statistical test done, there is a very strong positive relationship between population growth and life expectancy of the same population. As was shown, then Italian population has continued growing from 1952 to 2007 and together with this growth the life expectancy is greater.