Homework 8

Introduction

For this assignemt, the student has been required to download the dataset from the website on Estonian statistics into a R Markdown session. The data set shows the number of admitted students across different specialties and recorded for Year 2021. The numbers of admitted students are presented for Study programme group, Level of study, and Mother tongue.

higher.edu <- as_tibble(higher.edu)

higher.edu <- select(higher.edu, -c(Year, Indicator,Study.programme.group, Other.mother.tongue, Mother.tongue.unknown))

HigherSummed <- higher.edu %>%
  group_by(Level.of.study) %>%
  summarise(across(everything(), sum))

HigherSummed

1. Perform quick EDA and pick up variables you want to explore in more depth.

For the purposes of this report, the researcher has chosen to omit data from admitted students that have different mother tongue than Estonian or Russian or have unknown mother tongue. As it is not specified in the data set what the mother tongue is for those students, it does not seem adequate to make any inferences by grouping together people from different regions of the world and different cultural upbringings.

In the year 2021, there were a total of 8555 students admitted into higher education from Estonian (n=7197) and Russian speaking (n=1358) people. Over half of the people (n=4511) admitted were for bachelor’s study. For both Estonian and Russian speaking people, Information and Communications Technology course had the highest admittence. The data shows there was no study programme, which admitted more Russian speaking students than Estonian speaking.

2. Prepare the data set that includes only variables of your interest in a suitable format for analysis (use dlyr package and tidyr when necessary).

3. Based on the dataset, formulate your Research Question(s). Think of questions that you can answer with the help of either chi-square statistical test or some type of t-test.

Is the amount of people studying past a bachelors degree dependent on whether students speak Estonian or Russian?

4. Hypothesis testing.

4.1 State the null hypothesis and the alternative hypothesis.

Null hypothesis - There is no relationship between the categorical variables e.g. you can not predict the amount of people who would be admitted into a doctoral level course dependent of the amount of people admitted in a bachelors level programme.

Alternative hypothesis - There is a relationship between the number of students admitted and a higher amount of people admitted in bachelors level programme also means increased admittance levels in other study levels.

4.2 Report on collected data and sample size.

The sample size is rather high (n=8555) and should contribute to a valid effect power. AS the collected data did not mention what other language speaking students were admitted, the researcher chose to omit that data. Perhaps following the dataset to its orignial location, we could see what other languages were represented. However, for this analysis, just Estonian and Russian was sufficient.

4.3 Check the assumption of the chosen statistical test. Perform the required statistical test.

To undestand whether there is a relationship between the levels of study in Estonia and admitennce rates, the reesearcher performed a Chi-Squared test to check this relationship. For the use of a Chi-squared test, the following assumptions must be met: Two Categorical Variables (Estonian, Russian); Two or more categories (Bachelor’s, Doctoral, Master’s, Integrated); Independence of observations - two groups are not dependant on each other.

ChiLanguage <- chisq.test(HigherSummed$Level.of.study, HigherSummed$Estonian, HigherSummed$Russian)
## Warning in correct && nrow(x) == 2L: 'length(x) = 4 > 1' in coercion to
## 'logical(1)'
## Warning in chisq.test(HigherSummed$Level.of.study, HigherSummed$Estonian, : Chi-
## squared approximation may be incorrect
ChiLanguage
## 
##  Pearson's Chi-squared test
## 
## data:  HigherSummed$Level.of.study and HigherSummed$Estonian
## X-squared = 12, df = 9, p-value = 0.2133
round(ChiLanguage$residuals, 3)
##                                       HigherSummed$Estonian
## HigherSummed$Level.of.study             169  519 2804 3705
##   Bachelor's study                     -0.5 -0.5 -0.5  1.5
##   Doctoral study                        1.5 -0.5 -0.5 -0.5
##   Integrated Bachelor's/Master's study -0.5  1.5 -0.5 -0.5
##   Master's study                       -0.5 -0.5  1.5 -0.5

For the Estonian and Russian speaking admittance in different levels of study, there was no statistically significant (p=0.2) result with a Chi-Squared result of X-squared(9, N=8555) = 12.

4.4 Decide whether to reject or fail to reject your null hypothesis, report selected significance level.

As the statistical test was not significant (p=0.2) , we can accept the null hypothesis and reject the alternative hypothesis. There is no relationship between the admitennce of Estonian and Russian students in different levels of study.