Importing and cleaning data are mandatory steps prior to running any type of analytics. We should always generate a priori hypotheses based on the evidence, literature, and logic that we have available to us (e.g., guessing that a strong and positive correlation exists between height and weight; > cor(DS0012$height, DS0012$weight): [1] 0.8011362)

Generating plots can help us understand our data better:

> plot(DS0012$height, DS0012$weight,
ylab = "Mass (kg)", xlab = "Height (cm)")

> hist(DS0012$BMI)
> hist(DS0012$BMI, main = "Distribution of BMI")
> hist(DS0012$BMI, main = "Distribution of BMI", col = "lightblue")
> hist(DS0012$BMI, main = "Distribution of BMI", col = "lightblue",
xlab = "BMI", probability = TRUE)
> lines(density(DS0012$BMI))
> lines(density(DS0012$BMI), col = "red")
> abline(v = mean(DS0012$BMI, lty = "dashed", col = "green"))
> mean(DS0012$BMI)
[1] 25.7057

Above is a gradual step-wise series of commands that result builds a density histogram of BMI with a kernel density estimate of the distribution and a mean value (m=25.71) in the following density histogram:

> install.packages("ggplot2")
The downloaded binary packages are in
/var/folders/nl/4z5wsxpn3cngl9tp9y17r5sm0000gn/T//RtmpwJmKSM/downloaded_packages
> library("ggplot2", lib.loc="/Library/Frameworks/R.framework/Versions/3.0/Resources/library")
> library(ggplot2)

Installing new packages in R is a simple process, that is made even more simple with the package finder that exists in our RStudio:

In the next section, we will start messing around with ggplot2.

### Like this:

Like Loading...

*Related*

## About dwmaasberg

Memories are physical connections between neurons. I think that is pretty cool!