Importing and cleaning data are mandatory steps prior to running any type of analytics. We should always generate a priori hypotheses based on the evidence, literature, and logic that we have available to us (e.g., guessing that a strong and positive correlation exists between height and weight; > cor(DS0012$height, DS0012$weight):  0.8011362)
Generating plots can help us understand our data better:
> plot(DS0012$height, DS0012$weight, ylab = "Mass (kg)", xlab = "Height (cm)")
> hist(DS0012$BMI) > hist(DS0012$BMI, main = "Distribution of BMI") > hist(DS0012$BMI, main = "Distribution of BMI", col = "lightblue") > hist(DS0012$BMI, main = "Distribution of BMI", col = "lightblue", xlab = "BMI", probability = TRUE) > lines(density(DS0012$BMI)) > lines(density(DS0012$BMI), col = "red") > abline(v = mean(DS0012$BMI, lty = "dashed", col = "green")) > mean(DS0012$BMI)  25.7057
Above is a gradual step-wise series of commands that result builds a density histogram of BMI with a kernel density estimate of the distribution and a mean value (m=25.71) in the following density histogram:
> install.packages("ggplot2") The downloaded binary packages are in /var/folders/nl/4z5wsxpn3cngl9tp9y17r5sm0000gn/T//RtmpwJmKSM/downloaded_packages > library("ggplot2", lib.loc="/Library/Frameworks/R.framework/Versions/3.0/Resources/library") > library(ggplot2)
Installing new packages in R is a simple process, that is made even more simple with the package finder that exists in our RStudio:
In the next section, we will start messing around with ggplot2.