##  Level.of.study     Study.programme.group    Estonian         Russian      
##  Length:108         Length:108            Min.   :  0.00   Min.   :  0.00  
##  Class :character   Class :character      1st Qu.:  0.00   1st Qu.:  0.00  
##  Mode  :character   Mode  :character      Median : 12.50   Median :  1.00  
##                                           Mean   : 66.64   Mean   : 12.57  
##                                           3rd Qu.: 74.25   3rd Qu.: 13.00  
##                                           Max.   :571.00   Max.   :165.00  
##  Other.mother.tongue Mother.tongue.unknown
##  Min.   :  0.000     Min.   : 0.000       
##  1st Qu.:  0.000     1st Qu.: 0.000       
##  Median :  0.000     Median : 0.000       
##  Mean   :  7.231     Mean   : 6.583       
##  3rd Qu.:  7.250     3rd Qu.: 4.000       
##  Max.   :103.000     Max.   :88.000
## # A tibble: 6 × 6
##   Level.of.study   Study.programme.group         Eston…¹ Russian Other…² Mothe…³
##   <chr>            <chr>                           <int>   <int>   <int>   <int>
## 1 Bachelor's study Journalism and information         82       3       0       0
## 2 Bachelor's study Architecture and construction      31       7       0       0
## 3 Bachelor's study Biological and environmental…     215      45       1       0
## 4 Bachelor's study Physical sciences                 173      53      20      11
## 5 Bachelor's study Humanities                        124      24       5       9
## 6 Bachelor's study Information and Communicatio…     571     165      14      23
## # … with abbreviated variable names ¹​Estonian, ²​Other.mother.tongue,
## #   ³​Mother.tongue.unknown
## # A tibble: 6 × 6
##   Level.of.study Study.programme.group           Eston…¹ Russian Other…² Mothe…³
##   <chr>          <chr>                             <int>   <int>   <int>   <int>
## 1 Doctoral study Transport services                    0       0       0       0
## 2 Doctoral study Religion and theology                 2       0       0       1
## 3 Doctoral study Veterinary                            0       0       3       2
## 4 Doctoral study Law                                   2       0       1       0
## 5 Doctoral study Teacher training and education…      11       0       0       0
## 6 Doctoral study Business and administration           1       1       3       3
## # … with abbreviated variable names ¹​Estonian, ²​Other.mother.tongue,
## #   ³​Mother.tongue.unknown

1.A QUICK EXPLANTORY DATA ANALYSIS OF THE DATASET HIGHER_EDU_IN_ESTONIA This is a dataset downloaded from the Estonian University website of admitted students in the year 2021. It includes 8 varibales which are categorical. 2 have categorical data which are “level of study” and “study programme group”. “Estonian”,“Russian”,“other mother tongue” “mother tongue unknown” have continuous data. For proper analysis “year” and “indicator” was taken out because they have the same observations in all the cells. leaving 6 variables and 108 observations. The mean, median,1st quartile, 3rd quartile, minimum and maximum values for the number of admitted Estonian speaking students in 2021 includes the following; mean :66.64, median: 12.50, 1st quartile: 0.00, 3rd quartile:74.25, minimum value:0.00, maximum value: 571.00 for Russian Speaking students; mean:12.57, median: 1, 1st quartile: 0.00, 3rd quartile:13.00,minimum value: 0.00, maximum value: 165.00

2.My Sample focus would be on the masters level of study to see if there is any relationship between masters student who speak Russian and those who speak Estonian.

## [1] "Level.of.study"        "Study.programme.group" "Estonian"             
## [4] "Russian"               "Other.mother.tongue"   "Mother.tongue.unknown"
## # A tibble: 108 × 3
##    Level.of.study   Estonian Russian
##    <chr>               <int>   <int>
##  1 Bachelor's study       82       3
##  2 Bachelor's study       31       7
##  3 Bachelor's study      215      45
##  4 Bachelor's study      173      53
##  5 Bachelor's study      124      24
##  6 Bachelor's study      571     165
##  7 Bachelor's study       25       0
##  8 Bachelor's study      266      84
##  9 Bachelor's study      206      40
## 10 Bachelor's study       67      22
## # … with 98 more rows
##  Level.of.study        Estonian        Russian     
##  Length:27          Min.   :  0.0   Min.   : 0.00  
##  Class :character   1st Qu.: 28.0   1st Qu.: 1.50  
##  Mode  :character   Median : 53.0   Median : 7.00  
##                     Mean   :103.9   Mean   :16.22  
##                     3rd Qu.:113.5   3rd Qu.:18.50  
##                     Max.   :531.0   Max.   :85.00
## # A tibble: 6 × 3
##   Level.of.study Estonian Russian
##   <chr>             <int>   <int>
## 1 Master's study       46       4
## 2 Master's study       75      13
## 3 Master's study      105      18
## 4 Master's study       53       9
## 5 Master's study       43       4
## 6 Master's study      353      79
## # A tibble: 6 × 3
##   Level.of.study Estonian Russian
##   <chr>             <int>   <int>
## 1 Master's study       30       7
## 2 Master's study       26       0
## 3 Master's study        0       0
## 4 Master's study      150      20
## 5 Master's study      455      85
## 6 Master's study      531      73
## [1] 134.5973
## [1] 24.33474
## Warning in var(a): NAs introduced by coercion
##                Level.of.study  Estonian   Russian
## Level.of.study             NA        NA        NA
## Estonian                   NA 18116.439 3169.4957
## Russian                    NA  3169.496  592.1795
## [1] 18116.44
## [1] 592.1795

3.Research question - Is there a relationship between the masters student that speak Estonian and those that speak Russian?

## `geom_smooth()` using formula 'y ~ x'

To see if there is a relationship between the students that speak Estonian and Russian to do this i would have to categorize the variables and create a contingency table.

##  [1]  46  75 105  53  43 353  12 106  74   7   0  41  50  63  22  30 125  70  18
## [20] 198 121  30  26   0 150 455 531
##  [1]  4 13 18  9  4 79  0 11  9  2  0  7  0  3  0  1 19 13  2 37 22  7  0  0 20
## [26] 85 73
##      
##        0  1  2  3  4  7  9 11 13 18 19 20 22 37 73 79 85 Sum
##   0    2  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   2
##   7    0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0   1
##   12   1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   1
##   18   0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0   1
##   22   1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   1
##   26   1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   1
##   30   0  1  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0   2
##   41   0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0   1
##   43   0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0   1
##   46   0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0   1
##   50   1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   1
##   53   0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0   1
##   63   0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0   1
##   70   0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0   1
##   74   0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0   1
##   75   0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0   1
##   105  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0   1
##   106  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0   1
##   121  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0   1
##   125  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0   1
##   150  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0   1
##   198  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0   1
##   353  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0   1
##   455  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1   1
##   531  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0   1
##   Sum  6  1  2  1  2  2  2  1  2  1  1  1  1  1  1  1  1  27

This table can be visualized using the function balloonplot() in gplots package

## Warning in chisq.test(c.table.1): Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  c.table.1
## X-squared = 411.75, df = 425, p-value = 0.6687

HYPOTHESIS TESTING

4.1 Stating the Null Hypothesis- the row and coloum variables of the contingency table are independent, and have no relationship. Altenate Hypotheis- the row and coloumn varibles are dependent and there is a relationship between Estonian speaking masters student and Russian speaking masters student.

4.2 Report on collected data: the data includes 3 varibles, level of study, Estonian, and Russian language speaking students and 27 observations. with the Minimum value, 1st quartile,median, mean, 3rd quartile and maximum value for Masters students who speak Estonian and Russian respectively. Estonian Russian
Min. : 0.0 Min. : 0.00
1st Qu.: 28.0 1st Qu.: 1.50
Median : 53.0 Median : 7.00
Mean :103.9 Mean :16.22
3rd Qu.:113.5 3rd Qu.:18.50
Max. :531.0 Max. :85.00

4.3 Assumptions are there might be some kind of relationship, or form of dependence between the masters students that speak Estonian and those that speak Russian language. To investigate this i would be using the chisquare test to determine this assumption.

4.4 From the result on the test carried out i got a p value of 0.158 which is way greater than 0.05

4.5 From the results obtained from the sample test. the p value is way greater than 0.05 which is the significant level hence i support and fail to reject the Null hypothesis that there is no relationship between Estonian speaking masters student and Russian speaking masters student. To select significance level p= 0.158 (p > 5%)

5.There is no relationship between the Masters students that speak Estonian and those that speak Russian.