Introduction

The dataset that I will explore in this assignment is from the Year 2021. The numbers of admitted students are presented for Study programme group, Level of study, and Mother tongue.

Here I will download the dataset to be explored following R code:

## # A tibble: 6 × 6
##   Level.of.study   Study.programme.group         Eston…¹ Russian Other…² Mothe…³
##   <chr>            <chr>                           <int>   <int>   <int>   <int>
## 1 Bachelor's study Journalism and information         82       3       0       0
## 2 Bachelor's study Architecture and construction      31       7       0       0
## 3 Bachelor's study Biological and environmental…     215      45       1       0
## 4 Bachelor's study Physical sciences                 173      53      20      11
## 5 Bachelor's study Humanities                        124      24       5       9
## 6 Bachelor's study Information and Communicatio…     571     165      14      23
## # … with abbreviated variable names ¹​Estonian, ²​Other.mother.tongue,
## #   ³​Mother.tongue.unknown

EDA

1.What are the data types of the variables?

## tibble [108 × 6] (S3: tbl_df/tbl/data.frame)
##  $ Level.of.study       : chr [1:108] "Bachelor's study" "Bachelor's study" "Bachelor's study" "Bachelor's study" ...
##  $ Study.programme.group: chr [1:108] "Journalism and information" "Architecture and construction" "Biological and environmental sciences" "Physical sciences" ...
##  $ Estonian             : int [1:108] 82 31 215 173 124 571 25 266 206 67 ...
##  $ Russian              : int [1:108] 3 7 45 53 24 165 0 84 40 22 ...
##  $ Other.mother.tongue  : int [1:108] 0 0 1 20 5 14 0 7 8 1 ...
##  $ Mother.tongue.unknown: int [1:108] 0 0 0 11 9 23 0 0 23 0 ...

2.What is the mean for each variable?

##  Level.of.study     Study.programme.group    Estonian         Russian      
##  Length:108         Length:108            Min.   :  0.00   Min.   :  0.00  
##  Class :character   Class :character      1st Qu.:  0.00   1st Qu.:  0.00  
##  Mode  :character   Mode  :character      Median : 12.50   Median :  1.00  
##                                           Mean   : 66.64   Mean   : 12.57  
##                                           3rd Qu.: 74.25   3rd Qu.: 13.00  
##                                           Max.   :571.00   Max.   :165.00  
##  Other.mother.tongue Mother.tongue.unknown
##  Min.   :  0.000     Min.   : 0.000       
##  1st Qu.:  0.000     1st Qu.: 0.000       
##  Median :  0.000     Median : 0.000       
##  Mean   :  7.231     Mean   : 6.583       
##  3rd Qu.:  7.250     3rd Qu.: 4.000       
##  Max.   :103.000     Max.   :88.000

3.Are there any Null / NA values?

## [1] 0

Analysis

Prepare the data set that includes only variables of your interest in a suitable format for analysis (use dlyr package and tidyr when necessary).

## `summarise()` has grouped output by 'Level.of.study'. You can override using
## the `.groups` argument.

New Data Set

## # A tibble: 6 × 3
## # Groups:   Level.of.study [2]
##   Level.of.study   mother.tongue         sum.of.students
##   <chr>            <chr>                           <int>
## 1 Bachelor's study Estonian                         3705
## 2 Bachelor's study Mother.tongue.unknown             263
## 3 Bachelor's study Other.mother.tongue               115
## 4 Bachelor's study Russian                           806
## 5 Doctoral study   Estonian                          169
## 6 Doctoral study   Mother.tongue.unknown              34

Research

  1. Based on the dataset, formulate your Research Question(s). Think of questions that you can answer with the help of either chi-square statistical test or some type of t-test.

Question: Is there a relation between level of study and the mother tounge of students I will be using Chi- Test

##                                       mother.tongue
## Level.of.study                         Estonian Mother.tongue.unknown
##   Bachelor's study                         3705                   263
##   Doctoral study                            169                    34
##   Integrated Bachelor's/Master's study      519                    11
##   Master's study                           2804                   403
##                                       mother.tongue
## Level.of.study                         Other.mother.tongue Russian
##   Bachelor's study                                     115     806
##   Doctoral study                                        90      23
##   Integrated Bachelor's/Master's study                  50      91
##   Master's study                                       526     438

Visualizing the results

Fromt the graph we can see that estonian level is in majority

Hypothesis testing

Stating the null hypothesis and the alternative hypothesis.

the level of study and mother tongue are independent the alternative is they are dependent

Report on collected data and sample size.

Collected data is from study level and mother tongue sample size is 5092

checking the assumption

First the sampple was randomly selected and also there are minimun of 5 observation expectred in each group

## 
##  Pearson's Chi-squared test
## 
## data:  cdftable
## X-squared = 687.89, df = 9, p-value < 2.2e-16
##                                       mother.tongue
## Level.of.study                          Estonian Mother.tongue.unknown
##   Bachelor's study                     3502.1532             345.98179
##   Doctoral study                        226.3613              22.36250
##   Integrated Bachelor's/Master's study  480.6596              47.48492
##   Master's study                       2987.8259             295.17080
##                                       mother.tongue
## Level.of.study                         Other.mother.tongue   Russian
##   Bachelor's study                               380.04469 660.82034
##   Doctoral study                                  24.56415  42.71205
##   Integrated Bachelor's/Master's study            52.15995  90.69553
##   Master's study                                 324.23121 563.77207

The chi square value is 687

P value, reject or accept

when we look at the P value as it 2.2 we reject the hypothesis

Reporting the results

When we analzyed the relaiton between level of study and mother tongue they are found to be dependent the chi square value was 687 and the p value was 2.2.