The following are answers to the exercises in Section 9.5.
Load the tidyverse package. Then assign the built-in dataframe starwars to an object named whatever you want. Then subset the dataframe by human species only. Save the subsetted dataframe as an object called swhuman. Then calculate and report the mean and median height in swhuman. Also report an NAs (missing data) in the height variable. Note: the units for the height variable are centimeters.
height
Min. :150.0
1st Qu.:168.5
Median :180.0
Mean :176.6
3rd Qu.:184.0
Max. :202.0
NA's :4
# The mean height is 176.6 centimeters, and the median height is 180 centimeters.There are 4 NAs.
Hopefully you noticed that there are indeed some NAs in the swhuman dataframe! Detect which rows have NAs for the height variable, and write the names of the characters that have this. Next, let’s fix these errors. Perform an internet search and populate those NAs with plausible values. If you need to convert from feet to centimeters, multiply the value in feet by 30.48. If you absolutely cannot find the height of any character substitute the median height from Question 1 for their height.
# Which rows have NA for the height variable?which(is.na(swhuman$height))
[1] 18 32 33 34
swhuman %>%slice(18,32,33,34) %>%print()
# A tibble: 4 × 14
name height mass hair_color skin_color eye_color birth_year sex gender
<chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
1 Arvel Cr… NA NA brown fair brown NA male mascu…
2 Finn NA NA black dark dark NA male mascu…
3 Rey NA NA brown light hazel NA fema… femin…
4 Poe Dame… NA NA brown light brown NA male mascu…
# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
# vehicles <list>, starships <list>
## It looks like Arvel Crynyd, Finn, Rey, and Poe Dameron all have NA values for height. From a quick Google search, I found heights of Finn, Rey, and Poe Dameron but not for Arvel Crynyd. Thus, I will enter a value of 180 (median from Question 1) for Arvel Crynyd. For Finn, Rey, and Poe Dameron, I will enter 176.8, 170.7, and 172, respectively. swhuman$height[18] <-180swhuman$height[32:34] <-c(176.8, 170.7, 172)
Once you have filled in this missing data, calculate the new mean and median for the height variable. Comment on how much of a difference the additional values made on the mean and median compared with the values you calculated in Question 1. Then determine the three shortest characters, and three tallest characters.
summary(swhuman$height)
Min. 1st Qu. Median Mean 3rd Qu. Max.
150.0 170.0 178.0 176.4 183.0 202.0
# The new mean is 176.4. This is 0.2 centimeters less than the mean from Question 1.# The new median is 178. This is 2 centimeters less than the median from Question 1.# Next, let's print the three shortest and three tallest characters.swhuman %>%slice_min(height, n =3)
# A tibble: 3 × 14
name height mass hair_color skin_color eye_color birth_year sex gender
<chr> <dbl> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
1 Leia Org… 150 49 brown light brown 19 fema… femin…
2 Mon Moth… 150 NA auburn fair blue 48 fema… femin…
3 Cordé 157 NA brown light brown NA fema… femin…
# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
# vehicles <list>, starships <list>
swhuman %>%slice_max(height, n =3)
# A tibble: 3 × 14
name height mass hair_color skin_color eye_color birth_year sex gender
<chr> <dbl> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
1 Darth Va… 202 136 none white yellow 41.9 male mascu…
2 Qui-Gon … 193 89 brown fair blue 92 male mascu…
3 Dooku 193 80 white fair brown 102 male mascu…
# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
# vehicles <list>, starships <list>
## The three shortest characters are Leia Organa, Mon Mothma, and Cordé.## The three tallest characters are Darth Vader, Qui-Gon Jinn, and Dooku.
Return to the larger starwars dataframe or whatever object to which you assigned it. Determine which characters have NA for height. If there are any characters with NA for height (hint: there are), enter plausible values for their heights using the approach taken in Question 2. Then report the mean and median height across everyone in this dataframe.
# Both approaches tell you the rows with NA in height. Use the approach you like.which(is.na(stardf$height))
[1] 28 82 83 84 85 86
stardf$height %>%is.na() %>%which()
[1] 28 82 83 84 85 86
# Determine which characters have NA for height.stardf %>%slice(28,82:86) %>%print()
# A tibble: 6 × 14
name height mass hair_color skin_color eye_color birth_year sex gender
<chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
1 Arvel Cr… NA NA brown fair brown NA male mascu…
2 Finn NA NA black dark dark NA male mascu…
3 Rey NA NA brown light hazel NA fema… femin…
4 Poe Dame… NA NA brown light brown NA male mascu…
5 BB8 NA NA none none black NA none mascu…
6 Captain … NA NA unknown unknown unknown NA <NA> <NA>
# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
# vehicles <list>, starships <list>
# I already have heights for the first four. BB8's height is 67.1, and Captain Phasma's height is 200.1.stardf$height[28] <-180stardf$height[82:86] <-c(176.8, 170.7, 172, 67.1, 200.1)# Now calculate mean and median height.summary(stardf$height)
Min. 1st Qu. Median Mean 3rd Qu. Max.
66.0 167.0 180.0 173.4 191.0 264.0
## The mean height is 173.4 cm and the median height is 180 cm.
Still working with the starwars dataframe, convert the species variable to factor. Then, group and summarise the mean height by species, and print this in descending order. Report which species is the tallest, on average. Then, rearrange and report the species which is the shortest, on average.
# Convert species to factorstardf$species <-as.factor(stardf$species)# Which is the tallest species, on average?stardf %>%group_by(species) %>%summarise(mean_height =mean(height)) %>%arrange(desc(mean_height))
## The Quermian species is the tallest on average.# Which is the shortest species on average?stardf %>%group_by(species) %>%summarise(mean_height =mean(height)) %>%arrange(mean_height)