## Level.of.study Study.programme.group Estonian Russian
## Length:108 Length:108 Min. : 0.00 Min. : 0.00
## Class :character Class :character 1st Qu.: 0.00 1st Qu.: 0.00
## Mode :character Mode :character Median : 12.50 Median : 1.00
## Mean : 66.64 Mean : 12.57
## 3rd Qu.: 74.25 3rd Qu.: 13.00
## Max. :571.00 Max. :165.00
## Other.mother.tongue Mother.tongue.unknown
## Min. : 0.000 Min. : 0.000
## 1st Qu.: 0.000 1st Qu.: 0.000
## Median : 0.000 Median : 0.000
## Mean : 7.231 Mean : 6.583
## 3rd Qu.: 7.250 3rd Qu.: 4.000
## Max. :103.000 Max. :88.000
## # A tibble: 6 × 6
## Level.of.study Study.programme.group Eston…¹ Russian Other…² Mothe…³
## <chr> <chr> <int> <int> <int> <int>
## 1 Bachelor's study Journalism and information 82 3 0 0
## 2 Bachelor's study Architecture and construction 31 7 0 0
## 3 Bachelor's study Biological and environmental… 215 45 1 0
## 4 Bachelor's study Physical sciences 173 53 20 11
## 5 Bachelor's study Humanities 124 24 5 9
## 6 Bachelor's study Information and Communicatio… 571 165 14 23
## # … with abbreviated variable names ¹Estonian, ²Other.mother.tongue,
## # ³Mother.tongue.unknown
## # A tibble: 6 × 6
## Level.of.study Study.programme.group Eston…¹ Russian Other…² Mothe…³
## <chr> <chr> <int> <int> <int> <int>
## 1 Doctoral study Transport services 0 0 0 0
## 2 Doctoral study Religion and theology 2 0 0 1
## 3 Doctoral study Veterinary 0 0 3 2
## 4 Doctoral study Law 2 0 1 0
## 5 Doctoral study Teacher training and education… 11 0 0 0
## 6 Doctoral study Business and administration 1 1 3 3
## # … with abbreviated variable names ¹Estonian, ²Other.mother.tongue,
## # ³Mother.tongue.unknown
1.A QUICK EXPLANTORY DATA ANALYSIS OF THE DATASET HIGHER_EDU_IN_ESTONIA This is a dataset downloaded from the Estonian University website of admitted students in the year 2021. It includes 8 varibales which are categorical. 2 have categorical data which are “level of study” and “study programme group”. “Estonian”,“Russian”,“other mother tongue” “mother tongue unknown” have continuous data. For proper analysis “year” and “indicator” was taken out because they have the same observations in all the cells. leaving 6 variables and 108 observations. The mean, median,1st quartile, 3rd quartile, minimum and maximum values for the number of admitted Estonian speaking students in 2021 includes the following; mean :66.64, median: 12.50, 1st quartile: 0.00, 3rd quartile:74.25, minimum value:0.00, maximum value: 571.00 for Russian Speaking students; mean:12.57, median: 1, 1st quartile: 0.00, 3rd quartile:13.00,minimum value: 0.00, maximum value: 165.00
2.My Sample focus would be on the masters level of study to see if there is any relationship between masters student who speak Russian and those who speak Estonian.
## [1] "Level.of.study" "Study.programme.group" "Estonian"
## [4] "Russian" "Other.mother.tongue" "Mother.tongue.unknown"
## # A tibble: 108 × 3
## Level.of.study Estonian Russian
## <chr> <int> <int>
## 1 Bachelor's study 82 3
## 2 Bachelor's study 31 7
## 3 Bachelor's study 215 45
## 4 Bachelor's study 173 53
## 5 Bachelor's study 124 24
## 6 Bachelor's study 571 165
## 7 Bachelor's study 25 0
## 8 Bachelor's study 266 84
## 9 Bachelor's study 206 40
## 10 Bachelor's study 67 22
## # … with 98 more rows
## Level.of.study Estonian Russian
## Length:27 Min. : 0.0 Min. : 0.00
## Class :character 1st Qu.: 28.0 1st Qu.: 1.50
## Mode :character Median : 53.0 Median : 7.00
## Mean :103.9 Mean :16.22
## 3rd Qu.:113.5 3rd Qu.:18.50
## Max. :531.0 Max. :85.00
## # A tibble: 6 × 3
## Level.of.study Estonian Russian
## <chr> <int> <int>
## 1 Master's study 46 4
## 2 Master's study 75 13
## 3 Master's study 105 18
## 4 Master's study 53 9
## 5 Master's study 43 4
## 6 Master's study 353 79
## # A tibble: 6 × 3
## Level.of.study Estonian Russian
## <chr> <int> <int>
## 1 Master's study 30 7
## 2 Master's study 26 0
## 3 Master's study 0 0
## 4 Master's study 150 20
## 5 Master's study 455 85
## 6 Master's study 531 73
## [1] 134.5973
## [1] 24.33474
## Warning in var(a): NAs introduced by coercion
## Level.of.study Estonian Russian
## Level.of.study NA NA NA
## Estonian NA 18116.439 3169.4957
## Russian NA 3169.496 592.1795
## [1] 18116.44
## [1] 592.1795
3.Research question - Is there a relationship between the masters student that speak Estonian and those that speak Russian?
## `geom_smooth()` using formula 'y ~ x'
To see if there is a relationship between the students that speak Estonian and Russian to do this i would have to categorize the variables and create a contingency table.
## [1] 46 75 105 53 43 353 12 106 74 7 0 41 50 63 22 30 125 70 18
## [20] 198 121 30 26 0 150 455 531
## [1] 4 13 18 9 4 79 0 11 9 2 0 7 0 3 0 1 19 13 2 37 22 7 0 0 20
## [26] 85 73
##
## 0 1 2 3 4 7 9 11 13 18 19 20 22 37 73 79 85 Sum
## 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2
## 7 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
## 12 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
## 18 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
## 22 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
## 26 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
## 30 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 2
## 41 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1
## 43 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1
## 46 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1
## 50 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
## 53 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1
## 63 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1
## 70 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1
## 74 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1
## 75 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1
## 105 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1
## 106 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1
## 121 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1
## 125 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1
## 150 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1
## 198 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1
## 353 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1
## 455 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
## 531 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1
## Sum 6 1 2 1 2 2 2 1 2 1 1 1 1 1 1 1 1 27
This table can be visualized using the function balloonplot() in gplots package
## Warning in chisq.test(c.table.1): Chi-squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: c.table.1
## X-squared = 411.75, df = 425, p-value = 0.6687
4.1 Stating the Null Hypothesis- the row and coloum variables of the contingency table are independent, and have no relationship. Altenate Hypotheis- the row and coloumn varibles are dependent and there is a relationship between Estonian speaking masters student and Russian speaking masters student.
4.2 Report on collected data: the data includes 3 varibles, level of
study, Estonian, and Russian language speaking students and 27
observations. with the Minimum value, 1st quartile,median, mean, 3rd
quartile and maximum value for Masters students who speak Estonian and
Russian respectively. Estonian Russian
Min. : 0.0 Min. : 0.00
1st Qu.: 28.0 1st Qu.: 1.50
Median : 53.0 Median : 7.00
Mean :103.9 Mean :16.22
3rd Qu.:113.5 3rd Qu.:18.50
Max. :531.0 Max. :85.00
4.3 Assumptions are there might be some kind of relationship, or form of dependence between the masters students that speak Estonian and those that speak Russian language. To investigate this i would be using the chisquare test to determine this assumption.
4.4 From the result on the test carried out i got a p value of 0.158 which is way greater than 0.05
4.5 From the results obtained from the sample test. the p value is way greater than 0.05 which is the significant level hence i support and fail to reject the Null hypothesis that there is no relationship between Estonian speaking masters student and Russian speaking masters student. To select significance level p= 0.158 (p > 5%)
5.There is no relationship between the Masters students that speak Estonian and those that speak Russian.