1. Introduction

It is thought that international migration has especially accelerated in the past decades. People seem to be pulled toward improved opportunities in host nations or pushed to leave challenging environments in their native countries. The population in Europe is declining due to population aging, low fertility and emigration and at the same time the ease of travel within Europe has enhanced population mobility. This is true to Estonia as well and therefore we think it’s important to understand the migration patterns in order to understand the problem better. The aim of the project is to analyze the migration data between 2004 and 2021 and understand what are the immigration and emigration trends and what conclusions can be drawn to summarize the issue at hand. Special focus is put on understanding whether there are gender differences connected to migration.

In order to do so a research question has been identified:

RQ1: Is there a difference by gender in the trends of migration?

2. Dataset description and exploratory analysis

The dataset regarding migration in Estonia was taken from Statistics Estonia web page. The dataset includes number of people emigrating and immigrating between 2004 and 2021 and specific countries the migration is aimed towards to. Emigration is defined as the action by which a person de-registers his or her place of residence from an administrative unit, settlement unit or settlement region of the beginning of the year and immigration is defined as the action by which a person registers his or her place of residence in an administrative unit, settlement unit or settlement region other than the one at the beginning of the year.

2.1.1 Summary

##       Year        Country          Immigration.Males Immigration.Females
##  Min.   :2004   Length:486         Min.   :   0.00   Min.   :   0.00    
##  1st Qu.:2008   Class :character   1st Qu.:  10.00   1st Qu.:   9.00    
##  Median :2012   Mode  :character   Median :  33.00   Median :  27.50    
##  Mean   :2012                      Mean   : 115.31   Mean   :  85.02    
##  3rd Qu.:2017                      3rd Qu.:  89.75   3rd Qu.:  69.00    
##  Max.   :2021                      Max.   :1783.00   Max.   :1369.00    
##  Emigration.Males Emigration.Females
##  Min.   :   0.0   Min.   :   0.0    
##  1st Qu.:   7.0   1st Qu.:  10.0    
##  Median :  20.0   Median :  24.0    
##  Mean   : 100.4   Mean   : 100.6    
##  3rd Qu.:  55.0   3rd Qu.:  63.0    
##  Max.   :2459.0   Max.   :2668.0

The dataset includes 486 objects of 6 variables. Data regarding year, immigration males and females, emigration males and females is reported as integers, and country as a character type. There are 27 countries reported in the dataset out of which 4 are continents.

2.1.2 View of head and tail end of the data

##   Year       Country Immigration.Males Immigration.Females Emigration.Males
## 1 2004     ..Austria                 2                   1                2
## 2 2004     ..Belgium                 5                   4                3
## 3 2004       ..Spain                 4                   4                8
## 4 2004 ..Netherlands                 4                   2                6
## 5 2004     ..Ireland                 2                   1               11
## 6 2004       ..Italy                11                   7                0
##   Emigration.Females
## 1                  5
## 2                  2
## 3                 20
## 4                  8
## 5                  7
## 6                  4
##     Year  Country Immigration.Males Immigration.Females Emigration.Males
## 481 2021 ..Russia               821                 694              305
## 482 2021   Africa               159                 118               43
## 483 2021     Asia               662                 377              237
## 484 2021  America               241                 205              126
## 485 2021    ..USA               116                 150               89
## 486 2021  Oceania                88                  65               56
##     Emigration.Females
## 481                291
## 482                 20
## 483                155
## 484                125
## 485                 87
## 486                 51

2.1.3 String check!

## 'data.frame':    486 obs. of  6 variables:
##  $ Year               : int  2004 2004 2004 2004 2004 2004 2004 2004 2004 2004 ...
##  $ Country            : chr  "..Austria" "..Belgium" "..Spain" "..Netherlands" ...
##  $ Immigration.Males  : int  2 5 4 4 2 11 10 0 19 8 ...
##  $ Immigration.Females: int  1 4 4 2 1 7 15 0 27 2 ...
##  $ Emigration.Males   : int  2 3 8 6 11 0 5 2 8 2 ...
##  $ Emigration.Females : int  5 2 20 8 7 4 4 3 8 8 ...

2.2 Immigration plots by year

2.2.1 contextual sampling

Overall picture from the immigration data shows a fairly linear curve, but there are some gender differences clearly in the sample regions checked for immigration to Estonia. Throughout the years, it seems to be more males immigrating than females. We have illustrated this using a scatter plot below.

## `geom_smooth()` using formula = 'y ~ x'

## `geom_smooth()` using formula = 'y ~ x'

## `geom_smooth()` using formula = 'y ~ x'

2.2.2 Gender frequency distribution Check

Below you can see histographic distribution representing male and female immigration respectively. It is observed that both the plots have a left-skewed histogram, where the majority of the total number of people for each gender immigrating into Estonia is rarely beyond 500 people. Any value above can be considered outliers. It is now, in later section (2.4), to be found out whether these outliers are with respect to regions or years.

2.3 Emigration plots by year

2.3.1 contextual sampling

Overall picture from the emigration data shows that, just like immigration, there are some gender differences when emigrating from Estonia from a regional perspective. Throughout the years, it seems to be more males emigrating than females. We have illustrated this using scatter plots below.

## `geom_smooth()` using formula = 'y ~ x'

## `geom_smooth()` using formula = 'y ~ x'

## `geom_smooth()` using formula = 'y ~ x'

2.3.2 Gender frequency distribution Check

Below you can see histographic distribution representing male and female emigration respectively. It is observed that both the plots have a left-skewed histogram, where the majority of the total number of people for each gender emigrating out of Estonia are rarely beyond 500 people. Any value above can be considered outliers. It is now, in later section (in 2.5), to be found out whether these outliers are with respect to regions or years.

2.4 Immigration by gender by country

2.4.1 Countrywise scatter plot

If we look at the immigration by country, we can notice for males and females that Finland, Ukraine and Russia spike the most, therefore it seems that it’s from the neighboring countries that most male immigrants come in.

2.4.2 Countrywise Box-plot for year range

To further our understanding from the scatter plot above and to confirm that outliers are due to specific country and not due to yearly variation, we decided to use box plots for both male and female gender for every region colored in yearly variation.

It can be seen that a spike in 2020 and 2021 has taken place, this is corresponding to all the countries unanimously.Therefor, considering the fact from previous section that most immigration distribution was within 500 range, main outliers values are caused due to certain countries, Finland, Russia, and Ukraine specifically.

2.5 Emigration by gender by country

2.5.1 Countrywise scatter plot

If we look at the emigration by country, we can notice for males and females that Finland spikes the most therefore it seems that it’s to Finland both Male and Female emigrants go.

2.5.2 Countrywise Box-plot for year range

To further our understanding from the scater plot above and to confirm that outliers are due to specific country and not due to yearly variation. We use box plots for both male and female gender for every region colored in yearly variation.

It is confirmed hat Finland is the outright outlier in the data for emigration

3. Analysis and hypothesis testing

The aim of the research is to understand whether there might be differences of migration based on gender. In order to do so, we have decided to do 4 sample t-tests, two for each, Immigration and Emigration.

Why 4? This is to check how the P value changes based on the country filter we place, to ensure no significant changes has taken place, if it does, then the filtration of the outliers has to be redone because then it can be said that there was a large loss of population which caused a significant shift in P-value and not just by a simple anomaly.The filter will be the country of Finland for both Immigration and Emigration.

In order to proceed with the hypothesis testing, the research hypothesis were stated:

Immigration

H0: There is NO differences of Immigration trends based on gender
H1: There IS a difference of Immigration trends based on gender

Emigration

H00: There is NO differences of Emigration trends based on gender
H01: There IS a difference of Emigration trends based on gender

We will analyse immigration and emigration separately to look into both aspects of the migration.

Immigration T-test Without filter…

## 
##  Welch Two Sample t-test
## 
## data:  migration_data_subset2$Immigration.Male and migration_data_subset2$Immigration.Female
## t = 2.1545, df = 873.81, p-value = 0.03148
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   2.696558 57.891919
## sample estimates:
## mean of x mean of y 
## 115.31070  85.01646

Immigration T-test With filter…

## 
##  Welch Two Sample t-test
## 
## data:  migration_data_subset2$Immigration.Male and migration_data_subset2$Immigration.Female
## t = 2.3027, df = 790.16, p-value = 0.02156
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   3.123255 39.218626
## sample estimates:
## mean of x mean of y 
##  83.22009  62.04915

Immigration Test is complete, with a P-Value at 3% and 2% respectively, hence null hypothesis is rejected. There is a difference between genders during immigration and it leans towards male more than female.

Emigration T-test Without filter…

## 
##  Welch Two Sample t-test
## 
## data:  migration_data_subset3$Emigration.Male and migration_data_subset3$Emigration.Female
## t = -0.012813, df = 965.34, p-value = 0.9898
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -39.65104  39.13664
## sample estimates:
## mean of x mean of y 
##  100.3807  100.6379

Emigration T-test With filter…

## 
##  Welch Two Sample t-test
## 
## data:  migration_data_subset4$Emigration.Male and migration_data_subset4$Emigration.Female
## t = -1.596, df = 929.32, p-value = 0.1108
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -13.749670   1.416337
## sample estimates:
## mean of x mean of y 
##  39.73291  45.89957

Emigration Test is complete, with a P-Value at 10% and 11% respectively, hence null hypothesis is not rejected. There is no difference between genders during emigration.

4. Discussion of the results

Based on the data available and the statistical test, it can be said that males tend to Immigrate more than females into Estonia.Although, this trend is gradually on the decline since 2020. Regions such as Finland, Russia, Ukraine, and ever so slightly Asia leading this trend.

But in terms of Emigration, the patterns are more or less the same. While a great majority still only emigrate to Finland.

5. Limitation

In terms of limitations, one factor to keep in mind with this dataset is that migration data are considered often incomplete, because people often do not register their new place of residence, hence the number of people affected by this might actually be much bigger.Since there has been significant differences in migration during the years 2020 and 2021, it would be wise to re-evaluate these findings in the next 5 years time.

6. Conclusion

In conclusion it can be said that it is important to look into the migration trends in order to understand the real reasons behind it. As it was shown, there are differences between male and female migration which could also indicate that different approach for different gender is needed in order to fully understand what is driving the migration. Also, by taking these differences into account, countries can develop better approaches in managing possibilities and difficulties that might come with migration.

On a positive note, as population decline is a matter of great concern for most countries recently, Japan, China and Germany etc, it is is safe to say from the overall view that Immigration is much greater than Emigration for Estonia, hence there is one less thing to worry about in terms of migration for the time being.