Given the data set Cars93 in R, how do we break down some of the variables and start determining where some of the salient differences are in our data?
In R, open the data set with the following prompt:
The first step is going to be to display the data so that we can understand the nature of the variables that were are working with.
Given this view of the variables, we are able to start to think about the questions we want to ask about our data. If we want to better understand the relationship between Tpye of
automobile and the Gas Mileage, the following prompt will give us a visual of the data:
> boxplot(Cars93$MPG.highway ~ Cars93$Type)
Viewing this Box Plot gives us a look at the Median differences between groups and the distance between the lowest 25th percentile and the highest 25th percentile in our groups. Running a One-Way ANOVA will identify if significant differences between groups exist (which, it appears there might be).
> aggregate(MPG.highway ~ Type, FUN = mean)
The aggregate function filters the groups and their means to help us further identify differences between Type and highway gas mileage:
And using a One-Way ANOVA will test for significant differences between groups:
> anova.table <- aov(Cars93$MPG.highway ~ Cars93$Type) > summary(anova.table)
This one-way ANOVA shows that the average MPG differs significantly based on Type of automobile (e.g., Compact, Van, etc; F(5,87) = 24.09, p > 0.001). Post-hoc comparisons will be needed to determine where the significant differences exist (e.g., between Small size cars and Vans).