## 
## The downloaded binary packages are in
##  /var/folders/nw/7l08mg653w34vzd4vnnvy8xc0000gn/T//RtmpR9msld/downloaded_packages
## # A tibble: 6 × 6
##   country     continent  year lifeExp      pop gdpPercap
##   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
## 1 Afghanistan Asia       1952    28.8  8425333      779.
## 2 Afghanistan Asia       1957    30.3  9240934      821.
## 3 Afghanistan Asia       1962    32.0 10267083      853.
## 4 Afghanistan Asia       1967    34.0 11537966      836.
## 5 Afghanistan Asia       1972    36.1 13079460      740.
## 6 Afghanistan Asia       1977    38.4 14880372      786.
## 
## The downloaded binary packages are in
##  /var/folders/nw/7l08mg653w34vzd4vnnvy8xc0000gn/T//RtmpR9msld/downloaded_packages
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## 
## The downloaded binary packages are in
##  /var/folders/nw/7l08mg653w34vzd4vnnvy8xc0000gn/T//RtmpR9msld/downloaded_packages
## 
## The downloaded binary packages are in
##  /var/folders/nw/7l08mg653w34vzd4vnnvy8xc0000gn/T//RtmpR9msld/downloaded_packages
## Warning: package 'XQuartz' is not available for this version of R
## 
## A version of this package for your version of R might be available elsewhere,
## see the ideas at
## https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages

Introduction

In this short report we are going to look into whether there is any sort of correlation between Italian life expectancy and population growth throughout the years. For this, we have a dataset Gapminder. The given dataset contains data on life expectancy, population size and GDP per capita by country and year. Time frame of the given data is every 5 years, it starts from 1952 and ends with 2007. Initial dataset is formed by data frame with 1704 rows and 6 variables out of which only 2 will be analysed: population and life expectancy.

The life expectancy and the population both are a numeric data types.

Research question for this paper is: Does the growth in population have a positive impact on life expectancy of the people?

##    year population life_expectancy
## 1  1952   47666000          65.940
## 2  1957   49182000          67.810
## 3  1962   50843200          69.240
## 4  1967   52667100          71.060
## 5  1972   54365564          72.190
## 6  1977   56059245          73.480
## 7  1982   56535636          74.980
## 8  1987   56729703          76.420
## 9  1992   56840847          77.440
## 10 1997   57479469          78.820
## 11 2002   57926999          80.240
## 12 2007   58147733          80.546
##       year        population       life_expectancy
##  Min.   :1952   Min.   :47666000   Min.   :65.94  
##  1st Qu.:1966   1st Qu.:52211125   1st Qu.:70.61  
##  Median :1980   Median :56297440   Median :74.23  
##  Mean   :1980   Mean   :54536958   Mean   :74.01  
##  3rd Qu.:1993   3rd Qu.:57000502   3rd Qu.:77.78  
##  Max.   :2007   Max.   :58147733   Max.   :80.55

Description of data transformation

In order to look into Italian population and life expectancy specifically, it was selected and filtered out from the whole data set. A new table with relevant data was created. This new table includes 12 objects of 2 variables.

From the dataset we can see that minimum life expectancy for Italy has been 65.94 years and maximum 80.55 years. Median for life expectancy is 74.23 years. Life expectancy was the lowest in 1952 and highest in 2007.

What goes for the population in the year range from 1952 to 2007, we can say that the minimum number of people in the country was 47666000 and maximum 58147733. Minimum number of population was in 1952 and maximum in 2007.

Correlation analysis

In order to understand whether there is a correlation between life expectancy and the population a scatter plot was created in order to understand the relationship between two variables.

## `geom_smooth()` using formula = 'y ~ x'

The plot shows a linear relationship between population and life expectancy.

Since the data showed normal distributions from a visual view on histograms a Pearson’s correlation test was done.

## [1] 0.96

The Pearson’s correlation coeficent for life expectancy and population is 0.96. The number is positive therefore there is a very strong positive correlation between the growth of the population and life expectancy.

Hypotesis testing

In order to proceed with the hypotesis testing, the research hypotesis were stated:

H0 There is NO relationship between life expectancy and population

H1 There IS a relationship between life expectancy and population

The selected statistical test for this was Pearson’s r test.

## [1] 0.96
## 
##  Pearson's product-moment correlation
## 
## data:  life_expectancy and population
## t = 10.664, df = 10, p-value = 8.79e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8556059 0.9886580
## sample estimates:
##      cor 
## 0.958736

The Pearson’s correlation test was done on the variables of the population and life expectancy and the p-value is 0.000000879 which is less than the significance level alpha = 0.05. Therefore, we can conclude that there is a significant very strong positive correlation between the population number and life expectancy, r (10) = .96, p < .00001.

Results

Based on the statistical test done, there is a very strong positive relationship between population growth and life expectancy of the same population. As was shown, then Italian population has continued growing from 1952 to 2007 and together with this growth the life expectancy is greater.