Categorical Variables in R

If we link back to the data set that I was working with earlier today, we left off with a cleaned data set, and a newly created continuous variable: BMI.

We have two unique gender variables (1: Male; 2: Female), unfortunately they are listed as numerical values.

> head(DS0012)
      id   gender  age   weight   height         BMI
1  41512        2   80     44.2    154.7    18.46893
2  41513        2   80     42.6    137.9    22.40170
4  41612        2   80     68.3    154.4    28.65010
5  41617        2   80     48.2    142.5    23.73653
6  41627        2   80     55.4    151.8    24.04176
7  41641        2   80     86.2    159.0    34.09675

> unique(DS0012$gender)
[1] 2 1
> DS0012$gender<-factor(DS0012$gender, labels=c("M","F"))
> head(DS0012)
      id   gender  age   weight   height         BMI
1  41512        F   80     44.2    154.7    18.46893
2  41513        F   80     42.6    137.9    22.40170
4  41612        F   80     68.3    154.4    28.65010
5  41617        F   80     48.2    142.5    23.73653
6  41627        F   80     55.4    151.8    24.04176
7  41641        F   80     86.2    159.0    34.09675

It also is the case that we would like to develop some categorical variables for several of the, currently, continuous variables that we are working with; specifically, “age” and “BMI”.  In the age variable, we would like to see ranges of age (e.g., child, teenager, adult, etc), and in the BMI variable (e.g., underweight, normal, overweight).

We will apply a range for ages: (0-13: Child; 13-18: Teenager; 18-40: Adult; 40-60: Mature; 60+: Senior), and the code for that is below.

> DS0012$age.category <- cut(DS0012$age, breaks = 
c(0, 13, 18, 40, 60, 81), right = FALSE, labels = 
c("child", "teenager", "adult", "mature", "senior"))

> head(DS0012)
      id  gender  age   weight   height         BMI   age.category
1  41512       F   80     44.2    154.7    18.46893         senior
2  41513       F   80     42.6    137.9    22.40170         senior
4  41612       F   80     68.3    154.4    28.65010         senior
5  41617       F   80     48.2    142.5    23.73653         senior
6  41627       F   80     55.4    151.8    24.04176         senior
7  41641       F   80     86.2    159.0    34.09675         senior

The final categorical variable that we need to assign to our continuous data is the BMI index: (<18.5: Underweight; 18.5-25: Normal; 25-30: Overweight; >30: Obese).

> DS0012$BMI.category <- cut(DS0012$BMI, breaks = 
c(0, 18.5, 25, 30, 100), labels = 
c("underweight", "normal", "overweight", "obese"))
> ##New category must be created... labels assigned... etc....
> head(DS0012)
  id  gender   age   weight   height  BMI      age.category  BMI.category
1 41512 F      80    44.2     154.7   18.46893 senior        underweight
2 41513 F      80    42.6     137.9   22.40170 senior        normal
4 41612 F      80    68.3     154.4   28.65010 senior        overweight
5 41617 F      80    48.2     142.5   23.73653 senior        normal
6 41627 F      80    55.4     151.8   24.04176 senior        normal
7 41641 F      80    86.2     159.0   34.09675 senior        obese
Advertisements

About dwmaasberg

Memories are physical connections between neurons. I think that is pretty cool!
This entry was posted in R, Statistics. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s