Appendix F — Answers for Section 9.5

The following are answers to the exercises in Section 9.5.

  1. Load the tidyverse package. Then assign the built-in dataframe starwars to an object named whatever you want. Then subset the dataframe by human species only. Save the subsetted dataframe as an object called swhuman. Then calculate and report the mean and median height in swhuman. Also report an NAs (missing data) in the height variable. Note: the units for the height variable are centimeters.
library(tidyverse)

stardf <- starwars

swhuman <- stardf %>%
  filter(species == "Human")

swhuman %>%
  select(height) %>%
  summary()
     height     
 Min.   :150.0  
 1st Qu.:168.5  
 Median :180.0  
 Mean   :176.6  
 3rd Qu.:184.0  
 Max.   :202.0  
 NA's   :4      
# The mean height is 176.6 centimeters, and the median height is 180 centimeters.There are 4 NAs.
  1. Hopefully you noticed that there are indeed some NAs in the swhuman dataframe! Detect which rows have NAs for the height variable, and write the names of the characters that have this. Next, let’s fix these errors. Perform an internet search and populate those NAs with plausible values. If you need to convert from feet to centimeters, multiply the value in feet by 30.48. If you absolutely cannot find the height of any character substitute the median height from Question 1 for their height.
# Which rows have NA for the height variable?

which(is.na(swhuman$height))
[1] 18 32 33 34
swhuman %>%
  slice(18,
        32,
        33,
        34) %>%
  print()
# A tibble: 4 × 14
  name      height  mass hair_color skin_color eye_color birth_year sex   gender
  <chr>      <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
1 Arvel Cr…     NA    NA brown      fair       brown             NA male  mascu…
2 Finn          NA    NA black      dark       dark              NA male  mascu…
3 Rey           NA    NA brown      light      hazel             NA fema… femin…
4 Poe Dame…     NA    NA brown      light      brown             NA male  mascu…
# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#   vehicles <list>, starships <list>
## It looks like Arvel Crynyd, Finn, Rey, and Poe Dameron all have NA values for height. From a quick Google search, I found heights of Finn, Rey, and Poe Dameron but not for Arvel Crynyd. Thus, I will enter a value of 180 (median from Question 1) for Arvel Crynyd. For Finn, Rey, and Poe Dameron, I will enter 176.8, 170.7, and 172, respectively. 

swhuman$height[18] <- 180

swhuman$height[32:34] <- c(176.8, 170.7, 172)
  1. Once you have filled in this missing data, calculate the new mean and median for the height variable. Comment on how much of a difference the additional values made on the mean and median compared with the values you calculated in Question 1. Then determine the three shortest characters, and three tallest characters.
summary(swhuman$height)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  150.0   170.0   178.0   176.4   183.0   202.0 
# The new mean is 176.4. This is 0.2 centimeters less than the mean from Question 1.
# The new median is 178. This is 2 centimeters less than the median from Question 1.

# Next, let's print the three shortest and three tallest characters.

swhuman %>%
  slice_min(height, n = 3)
# A tibble: 3 × 14
  name      height  mass hair_color skin_color eye_color birth_year sex   gender
  <chr>      <dbl> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
1 Leia Org…    150    49 brown      light      brown             19 fema… femin…
2 Mon Moth…    150    NA auburn     fair       blue              48 fema… femin…
3 Cordé        157    NA brown      light      brown             NA fema… femin…
# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#   vehicles <list>, starships <list>
swhuman %>%
  slice_max(height, n = 3)
# A tibble: 3 × 14
  name      height  mass hair_color skin_color eye_color birth_year sex   gender
  <chr>      <dbl> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
1 Darth Va…    202   136 none       white      yellow          41.9 male  mascu…
2 Qui-Gon …    193    89 brown      fair       blue            92   male  mascu…
3 Dooku        193    80 white      fair       brown          102   male  mascu…
# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#   vehicles <list>, starships <list>
## The three shortest characters are Leia Organa, Mon Mothma, and Cordé.
## The three tallest characters are Darth Vader, Qui-Gon Jinn, and Dooku.
  1. Return to the larger starwars dataframe or whatever object to which you assigned it. Determine which characters have NA for height. If there are any characters with NA for height (hint: there are), enter plausible values for their heights using the approach taken in Question 2. Then report the mean and median height across everyone in this dataframe.
# Both approaches tell you the rows with NA in height. Use the approach you like.

which(is.na(stardf$height))
[1] 28 82 83 84 85 86
stardf$height %>%
  is.na() %>%
  which()
[1] 28 82 83 84 85 86
# Determine which characters have NA for height.

stardf %>%
  slice(28,
        82:86) %>%
  print()
# A tibble: 6 × 14
  name      height  mass hair_color skin_color eye_color birth_year sex   gender
  <chr>      <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
1 Arvel Cr…     NA    NA brown      fair       brown             NA male  mascu…
2 Finn          NA    NA black      dark       dark              NA male  mascu…
3 Rey           NA    NA brown      light      hazel             NA fema… femin…
4 Poe Dame…     NA    NA brown      light      brown             NA male  mascu…
5 BB8           NA    NA none       none       black             NA none  mascu…
6 Captain …     NA    NA unknown    unknown    unknown           NA <NA>  <NA>  
# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#   vehicles <list>, starships <list>
# I already have heights for the first four. BB8's height is 67.1, and Captain Phasma's height is 200.1.

stardf$height[28] <- 180

stardf$height[82:86] <- c(176.8, 170.7, 172, 67.1, 200.1)

# Now calculate mean and median height.

summary(stardf$height)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   66.0   167.0   180.0   173.4   191.0   264.0 
## The mean height is 173.4 cm and the median height is 180 cm.
  1. Still working with the starwars dataframe, convert the species variable to factor. Then, group and summarise the mean height by species, and print this in descending order. Report which species is the tallest, on average. Then, rearrange and report the species which is the shortest, on average.
# Convert species to factor

stardf$species <- as.factor(stardf$species)

# Which is the tallest species, on average?

stardf %>%
  group_by(species) %>%
  summarise(mean_height = mean(height)) %>%
  arrange(desc(mean_height))
# A tibble: 38 × 2
   species  mean_height
   <fct>          <dbl>
 1 Quermian        264 
 2 Wookiee         231 
 3 Kaminoan        221 
 4 Kaleesh         216 
 5 Gungan          209.
 6 Pau'an          206 
 7 Besalisk        198 
 8 Cerean          198 
 9 Chagrian        196 
10 Nautolan        196 
# ℹ 28 more rows
## The Quermian species is the tallest on average.

# Which is the shortest species on average?

stardf %>%
  group_by(species) %>%
  summarise(mean_height = mean(height)) %>%
  arrange(mean_height)
# A tibble: 38 × 2
   species        mean_height
   <fct>                <dbl>
 1 Yoda's species         66 
 2 Aleena                 79 
 3 Ewok                   88 
 4 Vulptereen             94 
 5 Dug                   112 
 6 Droid                 121.
 7 Xexto                 122 
 8 Toydarian             137 
 9 Sullustan             160 
10 Toong                 163 
# ℹ 28 more rows
## Yoda's species is the shortest on average.