y <- read.csv("MYMALES.csv")
y
## Year Age.group Males.Estonia
## 1 2013 30-34 172
## 2 2014 30-34 197
## 3 2015 30-34 717
## 4 2016 30-34 647
## 5 2017 30-34 714
## 6 2018 30-34 657
## 7 2019 30-34 618
## 8 2020 30-34 458
## 9 2021 30-34 578
y <- y[ ,-2]
y
## Year Males.Estonia
## 1 2013 172
## 2 2014 197
## 3 2015 717
## 4 2016 647
## 5 2017 714
## 6 2018 657
## 7 2019 618
## 8 2020 458
## 9 2021 578
do i need to clean my data, does it have any null values or outliers that might affect my graphical analysis
The data being analysed is a sample of Estonian males immigrants that falls in age range of 30-34 within the year 2013 to 2021
The data set consists of two variables with column 1 having the year, column 2 population of male immigrants in Estonia
i actually had to clean my data because the initial second coloumn had the same age for all the years in the data
Firstly i need to identify what kind of data this, this is a continuous data.
what is the relationship between the variables year and number of Females.Estonia for each year
Do the low values in variable year correspond to the low values in variable Females.Estonia?
Do the large values in variable year correspond to the large values in variable Female.Estonia?
how does the values in the variable column year change in comparison to the values in the column female.Estonia
(You can check the slides from Session 5 - Data Exploration (EDA) for ideas or come up with your own questions).
to check the first 6 rows of my data i would use the head() function
head(y)
## Year Males.Estonia
## 1 2013 172
## 2 2014 197
## 3 2015 717
## 4 2016 647
## 5 2017 714
## 6 2018 657
to check the last 6 rows of my_data i would use the tail() function
tail(y)
## Year Males.Estonia
## 4 2016 647
## 5 2017 714
## 6 2018 657
## 7 2019 618
## 8 2020 458
## 9 2021 578
to check the structure of my_data i would use the str()
str(y)
## 'data.frame': 9 obs. of 2 variables:
## $ Year : int 2013 2014 2015 2016 2017 2018 2019 2020 2021
## $ Males.Estonia: int 172 197 717 647 714 657 618 458 578
this gives me on R 9 observations and 2 variables.
mean(y[,2])
## [1] 528.6667
hist(y[,2])
median(y[,2])
## [1] 618
boxplot(y[,2])