Looking at the life expectancy (LE), we can see that the top 6 countries with the highest expectancy in 1952 were all from Europe. The top 3 countris are Norway, Iceland and Netherlands, with an average LE of 72.4 years. In contrast, most of the countries with the lowest LE resided in Africa. However, the country with the lowest life expectancy in 1952 was Afghanistan, with an expectancy of 28.8 years. Overall, the mean life expectancy in 1952 was 49.06 years. Compared to 2007, the data recorded in the final year of the gapminder dataset, the average life expectancy has grown by 17.95 years.
Although in 1952 the top 6 countries with the highest life expectancy resided in Europe, it is the continent of Oceania which has the overall highest LE (69.25). The lowest belongs to Africa (39.16). By the year 2007 there is no change in that order, Oceania highest (80.72) and Africa with the lowest LE (54.81). However, Africa has made a greater leap than Oceania in improving the LE with an increase of 15.65 compared to 1952.
Years_Continent <- gapminder %>%
select(country,continent,year,lifeExp, gdpPercap) %>%
group_by(year, continent) %>%
filter(year == 1952 | year ==2007) %>%
summarise(AverageLifeExpectancy = mean(lifeExp), AverageGDP = mean(gdpPercap))
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
Years_Continent
## # A tibble: 10 × 4
## # Groups: year [2]
## year continent AverageLifeExpectancy AverageGDP
## <int> <fct> <dbl> <dbl>
## 1 1952 Africa 39.1 1253.
## 2 1952 Americas 53.3 4079.
## 3 1952 Asia 46.3 5195.
## 4 1952 Europe 64.4 5661.
## 5 1952 Oceania 69.3 10298.
## 6 2007 Africa 54.8 3089.
## 7 2007 Americas 73.6 11003.
## 8 2007 Asia 70.7 12473.
## 9 2007 Europe 77.6 25054.
## 10 2007 Oceania 80.7 29810.
analysis <- data.frame(Years_Continent)
#analysis
#Dates <- c(analysis[1])
Expectancy <- c(analysis[3])
#Expectancy
#Date
Exp <- c(39, 53, 46, 64, 69, 55, 74, 71, 78, 81)
#analysis[4]
GDP <- c(1253, 4079, 5195, 5661, 10298, 3089, 11003, 12473, 25054, 29810)
#cov(GDP, Exp)
#cor(Exp, GDP, method = c("pearson", "kendall", "spearman"))
cor.test(Exp, GDP, method = c("pearson", "kendall", "spearman"))
##
## Pearson's product-moment correlation
##
## data: Exp and GDP
## t = 4.3138, df = 8, p-value = 0.002567
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4364573 0.9602758
## sample estimates:
## cor
## 0.8362719
TaskFrame <- data.frame(Exp,GDP)
ggscatter(TaskFrame, x="Exp", y="GDP",
add="reg.line", conf.int=TRUE,
cor.coef = TRUE, cor.method = "pearson",
xlab = "Life Expectancy (1952 & 2007)", ylab = "GDP (1952 & 2007)")
## `geom_smooth()` using formula 'y ~ x'
A correlation analysis with the ggpubr package shows
that Life Expectancy and GDP are positively correlated (r(8)=.84,
p<.01). Countries with a higher life expectancy will also have a
higher GDP.
For the purposes of a more accurate understanding of the situation, we are also looking separately at the two years to see whether anything specific stands out regarding the correaltion.
Exp1952 <- c(39, 53, 46, 64, 69)
GDP1952 <- c(1253, 4079, 5195, 5661, 10298)
cor.test(Exp1952, GDP1952, method = c("pearson", "kendall", "spearman"))
##
## Pearson's product-moment correlation
##
## data: Exp1952 and GDP1952
## t = 3.0482, df = 3, p-value = 0.05551
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.05506712 0.99130130
## sample estimates:
## cor
## 0.8694401
TaskFrame1952 <- data.frame(Exp1952,GDP1952)
ggscatter(TaskFrame1952, x="Exp1952", y="GDP1952",
add="reg.line", conf.int=TRUE,
cor.coef = TRUE, cor.method = "pearson",
xlab = "Life Expectancy (1952)", ylab = "GDP (1952)")
## `geom_smooth()` using formula 'y ~ x'
As we can see there is also a positive correlation in 1952 on all the continents for life expectancy and GDP (r(3)=0.87, p<0.1), however it is not as statistically significant as the correalation for both years together. Mos likely that is caused by a decreased number of data points.
Similarly, we are also looking at the correlation in the year 2007.
Exp2007 <- c(55, 74, 71, 78, 81)
GDP2007 <- c(3089, 11003, 12473, 25054, 29810)
cor.test(Exp2007, GDP2007, method = c("pearson", "kendall", "spearman"))
##
## Pearson's product-moment correlation
##
## data: Exp2007 and GDP2007
## t = 3.3085, df = 3, p-value = 0.04544
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.01680471 0.99246244
## sample estimates:
## cor
## 0.885936
TaskFrame2007 <- data.frame(Exp2007,GDP2007)
ggscatter(TaskFrame2007, x="Exp2007", y="GDP2007",
add="reg.line", conf.int=TRUE,
cor.coef = TRUE, cor.method = "pearson",
xlab = "Life Expectancy (2007)", ylab = "GDP (2007)")
## `geom_smooth()` using formula 'y ~ x'
A quick analysis shows that life expectacny and GDP are also rather strongly correlated in 2007 (r(3)=0,89, p<0.1).
The researcher proposes that higher GDP will result in higher life expectancy. The null hypthesis however is that there is no correlation between GDP and LE.
mean(Exp)
## [1] 63
mean(GDP)
## [1] 10791.5
sd(Exp)
## [1] 14.14214
sd(GDP)
## [1] 9552.486
boxplot(GDP)
The standard deviation for GDP is rather high and a box plot shows that the data is rather positively skewed. As the data does not seem normally distributed, we will use a non-parametric correlation test to check the GDP and LE.
cor.test(Exp, GDP, method = c("spearman"))
##
## Spearman's rank correlation rho
##
## data: Exp and GDP
## S = 10, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.9393939
The Spearman’s rank correlation rho (S(18)=0.94) shows a high positive correlation and is highly significant (p<.0001). Therefore, we reject the null hypothesis and accept the alternative hypothesis, that GDP and LE are positively correlated and an increase in one variable results in the increase of the other variable.
The current paper took a look at a set of data which shows the life expectancy, population, and GDP for all countries throughout 1952-2007. For the purposes of this report, the researcher chose to compare life expectancy and GDP among 5 continents in the years 1952 and 2007. A pearson’s correlation showed a positive correlation between the two independant variables (IV), however when looking at GDP, there seemed to be unequal distribution among the values. Therefore, the reseacher used a non-parametric correlation test to compare the two IV’s again. A Spearman’s rank correlation test showed a statistically significant positive correlation among the two variables. The researcher rejects the null hypothesis and accepts the alternative hypothesis that there is a significant relation between GDP and life expectancy.