The dataset that I will explore in this assignment is from the Year 2021. The numbers of admitted students are presented for Study programme group, Level of study, and Mother tongue.
Here I will download the dataset to be explored following R code:
## # A tibble: 6 × 6
## Level.of.study Study.programme.group Eston…¹ Russian Other…² Mothe…³
## <chr> <chr> <int> <int> <int> <int>
## 1 Bachelor's study Journalism and information 82 3 0 0
## 2 Bachelor's study Architecture and construction 31 7 0 0
## 3 Bachelor's study Biological and environmental… 215 45 1 0
## 4 Bachelor's study Physical sciences 173 53 20 11
## 5 Bachelor's study Humanities 124 24 5 9
## 6 Bachelor's study Information and Communicatio… 571 165 14 23
## # … with abbreviated variable names ¹​Estonian, ²​Other.mother.tongue,
## # ³​Mother.tongue.unknown
1.What are the data types of the variables?
## tibble [108 × 6] (S3: tbl_df/tbl/data.frame)
## $ Level.of.study : chr [1:108] "Bachelor's study" "Bachelor's study" "Bachelor's study" "Bachelor's study" ...
## $ Study.programme.group: chr [1:108] "Journalism and information" "Architecture and construction" "Biological and environmental sciences" "Physical sciences" ...
## $ Estonian : int [1:108] 82 31 215 173 124 571 25 266 206 67 ...
## $ Russian : int [1:108] 3 7 45 53 24 165 0 84 40 22 ...
## $ Other.mother.tongue : int [1:108] 0 0 1 20 5 14 0 7 8 1 ...
## $ Mother.tongue.unknown: int [1:108] 0 0 0 11 9 23 0 0 23 0 ...
2.What is the mean for each variable?
## Level.of.study Study.programme.group Estonian Russian
## Length:108 Length:108 Min. : 0.00 Min. : 0.00
## Class :character Class :character 1st Qu.: 0.00 1st Qu.: 0.00
## Mode :character Mode :character Median : 12.50 Median : 1.00
## Mean : 66.64 Mean : 12.57
## 3rd Qu.: 74.25 3rd Qu.: 13.00
## Max. :571.00 Max. :165.00
## Other.mother.tongue Mother.tongue.unknown
## Min. : 0.000 Min. : 0.000
## 1st Qu.: 0.000 1st Qu.: 0.000
## Median : 0.000 Median : 0.000
## Mean : 7.231 Mean : 6.583
## 3rd Qu.: 7.250 3rd Qu.: 4.000
## Max. :103.000 Max. :88.000
3.Are there any Null / NA values?
## [1] 0
Prepare the data set that includes only variables of your interest in
a suitable format for analysis (use dlyr package and
tidyr when necessary).
## `summarise()` has grouped output by 'Level.of.study'. You can override using
## the `.groups` argument.
New Data Set
## # A tibble: 6 × 3
## # Groups: Level.of.study [2]
## Level.of.study mother.tongue sum.of.students
## <chr> <chr> <int>
## 1 Bachelor's study Estonian 3705
## 2 Bachelor's study Mother.tongue.unknown 263
## 3 Bachelor's study Other.mother.tongue 115
## 4 Bachelor's study Russian 806
## 5 Doctoral study Estonian 169
## 6 Doctoral study Mother.tongue.unknown 34
Question: Is there a relation between level of study and the mother tounge of students I will be using Chi- Test
## mother.tongue
## Level.of.study Estonian Mother.tongue.unknown
## Bachelor's study 3705 263
## Doctoral study 169 34
## Integrated Bachelor's/Master's study 519 11
## Master's study 2804 403
## mother.tongue
## Level.of.study Other.mother.tongue Russian
## Bachelor's study 115 806
## Doctoral study 90 23
## Integrated Bachelor's/Master's study 50 91
## Master's study 526 438
Visualizing the results
Fromt the graph we can see that estonian level is in majority
the level of study and mother tongue are independent the alternative is they are dependent
Report on collected data and sample size.
Collected data is from study level and mother tongue sample size is 5092
First the sampple was randomly selected and also there are minimun of 5 observation expectred in each group
##
## Pearson's Chi-squared test
##
## data: cdftable
## X-squared = 687.89, df = 9, p-value < 2.2e-16
## mother.tongue
## Level.of.study Estonian Mother.tongue.unknown
## Bachelor's study 3502.1532 345.98179
## Doctoral study 226.3613 22.36250
## Integrated Bachelor's/Master's study 480.6596 47.48492
## Master's study 2987.8259 295.17080
## mother.tongue
## Level.of.study Other.mother.tongue Russian
## Bachelor's study 380.04469 660.82034
## Doctoral study 24.56415 42.71205
## Integrated Bachelor's/Master's study 52.15995 90.69553
## Master's study 324.23121 563.77207
The chi square value is 687
when we look at the P value as it 2.2 we reject the hypothesis
When we analzyed the relaiton between level of study and mother tongue they are found to be dependent the chi square value was 687 and the p value was 2.2.