Importing the data

y <- read.csv("KK032_20221029-163824.csv") 
y
##   Year                                         Indicator Wastewater.management
## 1 2010 National expenditure for environmental protection                  97.2
## 2 2014 National expenditure for environmental protection                 125.2
## 3 2015 National expenditure for environmental protection                 138.1
## 4 2016 National expenditure for environmental protection                 126.8
## 5 2017 National expenditure for environmental protection                 126.9
## 6 2018 National expenditure for environmental protection                 138.9
## 7 2019 National expenditure for environmental protection                 132.2
y <- y[,-2]
y
##   Year Wastewater.management
## 1 2010                  97.2
## 2 2014                 125.2
## 3 2015                 138.1
## 4 2016                 126.8
## 5 2017                 126.9
## 6 2018                 138.9
## 7 2019                 132.2
  1. Prepare EDA (Explanatory Data Analysis). Write at least 5 questions and provide your answers during the first stage of EDA.

a) what does it represent
ans: This is a data showing the national expenditure for wastewater management between year 2010 to 2019 

b) Does my data need cleanup?
ans: Yes, i had to clean up my data so i could get a more useful set of data 

c) How many rows and columns are present in my data.
ans: 7 rows and 2 columns

d) what year did we have the highest national experditure for wastewater management between the year 2010 to 2019
ans: (138.9)  2018

e) what data tyoes do i have represented my data
ans: numeric (year), character (indicator), double int.(wastewater management) 

f) check for the first 6 rows
g) check for the last 6 rows

to check the first 6 rows of my data, i use the function head()

head(y)
##   Year Wastewater.management
## 1 2010                  97.2
## 2 2014                 125.2
## 3 2015                 138.1
## 4 2016                 126.8
## 5 2017                 126.9
## 6 2018                 138.9

to check the last 6 rows of my data, i use the function tail()

tail(y)
##   Year Wastewater.management
## 2 2014                 125.2
## 3 2015                 138.1
## 4 2016                 126.8
## 5 2017                 126.9
## 6 2018                 138.9
## 7 2019                 132.2

to check the first 6 rows of my data, i use the function str()

str(y)
## 'data.frame':    7 obs. of  2 variables:
##  $ Year                 : int  2010 2014 2015 2016 2017 2018 2019
##  $ Wastewater.management: num  97.2 125.2 138.1 126.8 126.9 ...

3.Provide brief descriptive statistical analysis of your data set (like measures of central tendency and dispersion).

to get the mean

mean (y[,2])
## [1] 126.4714

to get the median

median(y[,2])
## [1] 126.9

to get the histogram

hist(y[,2])

# to get the boxplot

boxplot(y[,2])

  1. Include at least one plot into your report. If ggplot2 is too complicated for you now, create a plot with R base functions.
barplot(y[,2])