For this assignemt, the student has been required to download the
dataset from the website on Estonian statistics into a
R Markdown session. The data set shows the number of
admitted students across different specialties and recorded for Year
2021. The numbers of admitted students are presented for Study
programme group, Level of study, and Mother
tongue.
higher.edu <- as_tibble(higher.edu)
higher.edu <- select(higher.edu, -c(Year, Indicator,Study.programme.group, Other.mother.tongue, Mother.tongue.unknown))
HigherSummed <- higher.edu %>%
group_by(Level.of.study) %>%
summarise(across(everything(), sum))
HigherSummed
For the purposes of this report, the researcher has chosen to omit data from admitted students that have different mother tongue than Estonian or Russian or have unknown mother tongue. As it is not specified in the data set what the mother tongue is for those students, it does not seem adequate to make any inferences by grouping together people from different regions of the world and different cultural upbringings.
In the year 2021, there were a total of 8555 students admitted into higher education from Estonian (n=7197) and Russian speaking (n=1358) people. Over half of the people (n=4511) admitted were for bachelor’s study. For both Estonian and Russian speaking people, Information and Communications Technology course had the highest admittence. The data shows there was no study programme, which admitted more Russian speaking students than Estonian speaking.
dlyr
package and tidyr when necessary).Is the amount of people studying past a bachelors degree dependent on whether students speak Estonian or Russian?
Null hypothesis - There is no relationship between the categorical variables e.g. you can not predict the amount of people who would be admitted into a doctoral level course dependent of the amount of people admitted in a bachelors level programme.
Alternative hypothesis - There is a relationship between the number of students admitted and a higher amount of people admitted in bachelors level programme also means increased admittance levels in other study levels.
The sample size is rather high (n=8555) and should contribute to a valid effect power. AS the collected data did not mention what other language speaking students were admitted, the researcher chose to omit that data. Perhaps following the dataset to its orignial location, we could see what other languages were represented. However, for this analysis, just Estonian and Russian was sufficient.
To undestand whether there is a relationship between the levels of study in Estonia and admitennce rates, the reesearcher performed a Chi-Squared test to check this relationship. For the use of a Chi-squared test, the following assumptions must be met: Two Categorical Variables (Estonian, Russian); Two or more categories (Bachelor’s, Doctoral, Master’s, Integrated); Independence of observations - two groups are not dependant on each other.
ChiLanguage <- chisq.test(HigherSummed$Level.of.study, HigherSummed$Estonian, HigherSummed$Russian)
## Warning in correct && nrow(x) == 2L: 'length(x) = 4 > 1' in coercion to
## 'logical(1)'
## Warning in chisq.test(HigherSummed$Level.of.study, HigherSummed$Estonian, : Chi-
## squared approximation may be incorrect
ChiLanguage
##
## Pearson's Chi-squared test
##
## data: HigherSummed$Level.of.study and HigherSummed$Estonian
## X-squared = 12, df = 9, p-value = 0.2133
round(ChiLanguage$residuals, 3)
## HigherSummed$Estonian
## HigherSummed$Level.of.study 169 519 2804 3705
## Bachelor's study -0.5 -0.5 -0.5 1.5
## Doctoral study 1.5 -0.5 -0.5 -0.5
## Integrated Bachelor's/Master's study -0.5 1.5 -0.5 -0.5
## Master's study -0.5 -0.5 1.5 -0.5
For the Estonian and Russian speaking admittance in different levels of study, there was no statistically significant (p=0.2) result with a Chi-Squared result of X-squared(9, N=8555) = 12.
As the statistical test was not significant (p=0.2) , we can accept the null hypothesis and reject the alternative hypothesis. There is no relationship between the admitennce of Estonian and Russian students in different levels of study.