9 Data Exploration with `dplyr`

The dplyr package is a great tool for organizing and manipulating our dataframe. Note, I use the term manipulate not to convey anything nefarious or unethical, but in the sense of data management. The functions of this package are named after useful verbs, making them relatively easy to remember. Some common dplyr functions are listed in Table 9.1.

Table 9.1: dplyr’s useful functions with their description.
`dplyr` Function	Description
`arrange()`	Change the order of rows.
`filter()`	Select rows based on column values.
`mutate()`	Change the values in certain columns, and create new columns.
`relocate()`	Change the order of columns.
`rename()`	Change the name of columns.
`select()`	Include or exclude a column.
`slice()`	Select rows based on location.
`summarise()`	Collapse a group into a single row.
`group_by()`	Select a grouping variable to perform an operation by group.

Let’s look at some examples with the built-in mtcars dataframe.

9.1 Filter() & Arrange()

library(dplyr)

# Load mtcars dataframe. Assign it to an object.
df <- mtcars

# Let's look at cars with only eight cylinders.
df %>%
  filter(cyl == 8)

                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8

# Let's add to that, and order the dataframe by mpg in descending order (highest to lowest). To do this, we embed the desc() function within arrange(). For ascending order, no need for the desc() function.

df %>%
  filter(cyl == 8) %>%
  arrange(desc(mpg))

                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4

9.2 Mutate() & Rename()

# Now let's add a new column called kpl (kilometers per litre) using mutate(). A quick Google search tells me that going from mpg to kpl involves dividing mpg by 2.352.  

df %>%
  filter(cyl == 8) %>%
  arrange(desc(mpg)) %>%
  mutate(kpl = mpg / 2.352) %>%
  head()

                   mpg cyl  disp  hp drat    wt  qsec vs am gear carb      kpl
Pontiac Firebird  19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2 8.163265
Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2 7.950680
Merc 450SL        17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3 7.355442
Merc 450SE        16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3 6.972789
Ford Pantera L    15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4 6.717687
Dodge Challenger  15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2 6.590136

# Let's add a new name for the wt variable called weight.
df %>%
  filter(cyl == 8) %>%
  arrange(desc(mpg)) %>%
  mutate(kpl = mpg / 2.352) %>%
  rename(weight = wt) %>%
  head()

                   mpg cyl  disp  hp drat weight  qsec vs am gear carb      kpl
Pontiac Firebird  19.2   8 400.0 175 3.08  3.845 17.05  0  0    3    2 8.163265
Hornet Sportabout 18.7   8 360.0 175 3.15  3.440 17.02  0  0    3    2 7.950680
Merc 450SL        17.3   8 275.8 180 3.07  3.730 17.60  0  0    3    3 7.355442
Merc 450SE        16.4   8 275.8 180 3.07  4.070 17.40  0  0    3    3 6.972789
Ford Pantera L    15.8   8 351.0 264 4.22  3.170 14.50  0  1    5    4 6.717687
Dodge Challenger  15.5   8 318.0 150 2.76  3.520 16.87  0  0    3    2 6.590136

9.3 Select() & Slice()

# Let's now only select a few variables using select()

df %>%
  select(mpg, cyl, wt) %>%
  head()

                   mpg cyl    wt
Mazda RX4         21.0   6 2.620
Mazda RX4 Wag     21.0   6 2.875
Datsun 710        22.8   4 2.320
Hornet 4 Drive    21.4   6 3.215
Hornet Sportabout 18.7   8 3.440
Valiant           18.1   6 3.460

## If we want to select all but a few variables, we can still use select. Let's say I want all variables except mpg, cyl, and wt. I just need to add a minus before each variable name.

df %>%
  select(-mpg, -cyl, -wt)

                     disp  hp drat  qsec vs am gear carb
Mazda RX4           160.0 110 3.90 16.46  0  1    4    4
Mazda RX4 Wag       160.0 110 3.90 17.02  0  1    4    4
Datsun 710          108.0  93 3.85 18.61  1  1    4    1
Hornet 4 Drive      258.0 110 3.08 19.44  1  0    3    1
Hornet Sportabout   360.0 175 3.15 17.02  0  0    3    2
Valiant             225.0 105 2.76 20.22  1  0    3    1
Duster 360          360.0 245 3.21 15.84  0  0    3    4
Merc 240D           146.7  62 3.69 20.00  1  0    4    2
Merc 230            140.8  95 3.92 22.90  1  0    4    2
Merc 280            167.6 123 3.92 18.30  1  0    4    4
Merc 280C           167.6 123 3.92 18.90  1  0    4    4
Merc 450SE          275.8 180 3.07 17.40  0  0    3    3
Merc 450SL          275.8 180 3.07 17.60  0  0    3    3
Merc 450SLC         275.8 180 3.07 18.00  0  0    3    3
Cadillac Fleetwood  472.0 205 2.93 17.98  0  0    3    4
Lincoln Continental 460.0 215 3.00 17.82  0  0    3    4
Chrysler Imperial   440.0 230 3.23 17.42  0  0    3    4
Fiat 128             78.7  66 4.08 19.47  1  1    4    1
Honda Civic          75.7  52 4.93 18.52  1  1    4    2
Toyota Corolla       71.1  65 4.22 19.90  1  1    4    1
Toyota Corona       120.1  97 3.70 20.01  1  0    3    1
Dodge Challenger    318.0 150 2.76 16.87  0  0    3    2
AMC Javelin         304.0 150 3.15 17.30  0  0    3    2
Camaro Z28          350.0 245 3.73 15.41  0  0    3    4
Pontiac Firebird    400.0 175 3.08 17.05  0  0    3    2
Fiat X1-9            79.0  66 4.08 18.90  1  1    4    1
Porsche 914-2       120.3  91 4.43 16.70  0  1    5    2
Lotus Europa         95.1 113 3.77 16.90  1  1    5    2
Ford Pantera L      351.0 264 4.22 14.50  0  1    5    4
Ferrari Dino        145.0 175 3.62 15.50  0  1    5    6
Maserati Bora       301.0 335 3.54 14.60  0  1    5    8
Volvo 142E          121.0 109 4.11 18.60  1  1    4    2

# If we want to select certain rows of a dataframe, we can do this with slice() by mentioning the index number of the columns. If we want to know the row number based on a column value (e.g. mpg > 20), we can use the which() function where you can write the column, relational operator, and value.

which(df$mpg > 20)

 [1]  1  2  3  4  8  9 18 19 20 21 26 27 28 32

df %>%
  slice(1:4,
        8,
        9,
        18:21,
        26:28,
        32) %>%
  print()

                mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4      21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230       22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

# Similarly, if you want to know which rows have the value 'NA' which indicates missing data for a particular column, we can wrap the is.na() function within the which() function like this.

## Let's add some NAs to the disp variable first.

df[c(5,8,21:22), 3] <- NA

## Now we will see which rows in disp have NA. 

which(is.na(df$disp))

[1]  5  8 21 22

## Often we want to know the highest and lowest values of a variable. We can use slice_min() for the lowest values and slice_max() for the highest values. The first argument is the column and the second is how many rows you want (e.g. n = 5).

## The five cars with the lowest mpg.

df %>%
  slice_min(mpg, n = 5)

                     mpg cyl disp  hp drat    wt  qsec vs am gear carb
Cadillac Fleetwood  10.4   8  472 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8  460 215 3.00 5.424 17.82  0  0    3    4
Camaro Z28          13.3   8  350 245 3.73 3.840 15.41  0  0    3    4
Duster 360          14.3   8  360 245 3.21 3.570 15.84  0  0    3    4
Chrysler Imperial   14.7   8  440 230 3.23 5.345 17.42  0  0    3    4

## The five cars with the highest mpg.

df %>%
  slice_max(mpg, n = 5)

                mpg cyl disp  hp drat    wt  qsec vs am gear carb
Toyota Corolla 33.9   4 71.1  65 4.22 1.835 19.90  1  1    4    1
Fiat 128       32.4   4 78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic    30.4   4 75.7  52 4.93 1.615 18.52  1  1    4    2
Lotus Europa   30.4   4 95.1 113 3.77 1.513 16.90  1  1    5    2
Fiat X1-9      27.3   4 79.0  66 4.08 1.935 18.90  1  1    4    1

9.4 Relocate() and Summarise()

# If I want to change the order of how some columns appear in the dataframe, I can do so with relocate() where the first argument is the column(s) to move, and the second argument specifies the intended new location with either .before or .after to indicate where you want to place the columns. 

## Let's move wt to the front.

df %>%
  relocate(wt, .before = everything()
           )

                       wt  mpg cyl  disp  hp drat  qsec vs am gear carb
Mazda RX4           2.620 21.0   6 160.0 110 3.90 16.46  0  1    4    4
Mazda RX4 Wag       2.875 21.0   6 160.0 110 3.90 17.02  0  1    4    4
Datsun 710          2.320 22.8   4 108.0  93 3.85 18.61  1  1    4    1
Hornet 4 Drive      3.215 21.4   6 258.0 110 3.08 19.44  1  0    3    1
Hornet Sportabout   3.440 18.7   8    NA 175 3.15 17.02  0  0    3    2
Valiant             3.460 18.1   6 225.0 105 2.76 20.22  1  0    3    1
Duster 360          3.570 14.3   8 360.0 245 3.21 15.84  0  0    3    4
Merc 240D           3.190 24.4   4    NA  62 3.69 20.00  1  0    4    2
Merc 230            3.150 22.8   4 140.8  95 3.92 22.90  1  0    4    2
Merc 280            3.440 19.2   6 167.6 123 3.92 18.30  1  0    4    4
Merc 280C           3.440 17.8   6 167.6 123 3.92 18.90  1  0    4    4
Merc 450SE          4.070 16.4   8 275.8 180 3.07 17.40  0  0    3    3
Merc 450SL          3.730 17.3   8 275.8 180 3.07 17.60  0  0    3    3
Merc 450SLC         3.780 15.2   8 275.8 180 3.07 18.00  0  0    3    3
Cadillac Fleetwood  5.250 10.4   8 472.0 205 2.93 17.98  0  0    3    4
Lincoln Continental 5.424 10.4   8 460.0 215 3.00 17.82  0  0    3    4
Chrysler Imperial   5.345 14.7   8 440.0 230 3.23 17.42  0  0    3    4
Fiat 128            2.200 32.4   4  78.7  66 4.08 19.47  1  1    4    1
Honda Civic         1.615 30.4   4  75.7  52 4.93 18.52  1  1    4    2
Toyota Corolla      1.835 33.9   4  71.1  65 4.22 19.90  1  1    4    1
Toyota Corona       2.465 21.5   4    NA  97 3.70 20.01  1  0    3    1
Dodge Challenger    3.520 15.5   8    NA 150 2.76 16.87  0  0    3    2
AMC Javelin         3.435 15.2   8 304.0 150 3.15 17.30  0  0    3    2
Camaro Z28          3.840 13.3   8 350.0 245 3.73 15.41  0  0    3    4
Pontiac Firebird    3.845 19.2   8 400.0 175 3.08 17.05  0  0    3    2
Fiat X1-9           1.935 27.3   4  79.0  66 4.08 18.90  1  1    4    1
Porsche 914-2       2.140 26.0   4 120.3  91 4.43 16.70  0  1    5    2
Lotus Europa        1.513 30.4   4  95.1 113 3.77 16.90  1  1    5    2
Ford Pantera L      3.170 15.8   8 351.0 264 4.22 14.50  0  1    5    4
Ferrari Dino        2.770 19.7   6 145.0 175 3.62 15.50  0  1    5    6
Maserati Bora       3.570 15.0   8 301.0 335 3.54 14.60  0  1    5    8
Volvo 142E          2.780 21.4   4 121.0 109 4.11 18.60  1  1    4    2

## Let's move wt to before disp.

df %>%
  relocate(wt, .before = disp)

                     mpg cyl    wt  disp  hp drat  qsec vs am gear carb
Mazda RX4           21.0   6 2.620 160.0 110 3.90 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 2.875 160.0 110 3.90 17.02  0  1    4    4
Datsun 710          22.8   4 2.320 108.0  93 3.85 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 3.215 258.0 110 3.08 19.44  1  0    3    1
Hornet Sportabout   18.7   8 3.440    NA 175 3.15 17.02  0  0    3    2
Valiant             18.1   6 3.460 225.0 105 2.76 20.22  1  0    3    1
Duster 360          14.3   8 3.570 360.0 245 3.21 15.84  0  0    3    4
Merc 240D           24.4   4 3.190    NA  62 3.69 20.00  1  0    4    2
Merc 230            22.8   4 3.150 140.8  95 3.92 22.90  1  0    4    2
Merc 280            19.2   6 3.440 167.6 123 3.92 18.30  1  0    4    4
Merc 280C           17.8   6 3.440 167.6 123 3.92 18.90  1  0    4    4
Merc 450SE          16.4   8 4.070 275.8 180 3.07 17.40  0  0    3    3
Merc 450SL          17.3   8 3.730 275.8 180 3.07 17.60  0  0    3    3
Merc 450SLC         15.2   8 3.780 275.8 180 3.07 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 5.250 472.0 205 2.93 17.98  0  0    3    4
Lincoln Continental 10.4   8 5.424 460.0 215 3.00 17.82  0  0    3    4
Chrysler Imperial   14.7   8 5.345 440.0 230 3.23 17.42  0  0    3    4
Fiat 128            32.4   4 2.200  78.7  66 4.08 19.47  1  1    4    1
Honda Civic         30.4   4 1.615  75.7  52 4.93 18.52  1  1    4    2
Toyota Corolla      33.9   4 1.835  71.1  65 4.22 19.90  1  1    4    1
Toyota Corona       21.5   4 2.465    NA  97 3.70 20.01  1  0    3    1
Dodge Challenger    15.5   8 3.520    NA 150 2.76 16.87  0  0    3    2
AMC Javelin         15.2   8 3.435 304.0 150 3.15 17.30  0  0    3    2
Camaro Z28          13.3   8 3.840 350.0 245 3.73 15.41  0  0    3    4
Pontiac Firebird    19.2   8 3.845 400.0 175 3.08 17.05  0  0    3    2
Fiat X1-9           27.3   4 1.935  79.0  66 4.08 18.90  1  1    4    1
Porsche 914-2       26.0   4 2.140 120.3  91 4.43 16.70  0  1    5    2
Lotus Europa        30.4   4 1.513  95.1 113 3.77 16.90  1  1    5    2
Ford Pantera L      15.8   8 3.170 351.0 264 4.22 14.50  0  1    5    4
Ferrari Dino        19.7   6 2.770 145.0 175 3.62 15.50  0  1    5    6
Maserati Bora       15.0   8 3.570 301.0 335 3.54 14.60  0  1    5    8
Volvo 142E          21.4   4 2.780 121.0 109 4.11 18.60  1  1    4    2

## Let's move wt to after qsec

df %>%
  relocate(wt, .after = qsec)

                     mpg cyl  disp  hp drat  qsec    wt vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 16.46 2.620  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 17.02 2.875  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 18.61 2.320  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 19.44 3.215  1  0    3    1
Hornet Sportabout   18.7   8    NA 175 3.15 17.02 3.440  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 20.22 3.460  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 15.84 3.570  0  0    3    4
Merc 240D           24.4   4    NA  62 3.69 20.00 3.190  1  0    4    2
Merc 230            22.8   4 140.8  95 3.92 22.90 3.150  1  0    4    2
Merc 280            19.2   6 167.6 123 3.92 18.30 3.440  1  0    4    4
Merc 280C           17.8   6 167.6 123 3.92 18.90 3.440  1  0    4    4
Merc 450SE          16.4   8 275.8 180 3.07 17.40 4.070  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 17.60 3.730  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 18.00 3.780  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 17.98 5.250  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 17.82 5.424  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 17.42 5.345  0  0    3    4
Fiat 128            32.4   4  78.7  66 4.08 19.47 2.200  1  1    4    1
Honda Civic         30.4   4  75.7  52 4.93 18.52 1.615  1  1    4    2
Toyota Corolla      33.9   4  71.1  65 4.22 19.90 1.835  1  1    4    1
Toyota Corona       21.5   4    NA  97 3.70 20.01 2.465  1  0    3    1
Dodge Challenger    15.5   8    NA 150 2.76 16.87 3.520  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 17.30 3.435  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 15.41 3.840  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 17.05 3.845  0  0    3    2
Fiat X1-9           27.3   4  79.0  66 4.08 18.90 1.935  1  1    4    1
Porsche 914-2       26.0   4 120.3  91 4.43 16.70 2.140  0  1    5    2
Lotus Europa        30.4   4  95.1 113 3.77 16.90 1.513  1  1    5    2
Ford Pantera L      15.8   8 351.0 264 4.22 14.50 3.170  0  1    5    4
Ferrari Dino        19.7   6 145.0 175 3.62 15.50 2.770  0  1    5    6
Maserati Bora       15.0   8 301.0 335 3.54 14.60 3.570  0  1    5    8
Volvo 142E          21.4   4 121.0 109 4.11 18.60 2.780  1  1    4    2

# Let's use the summarise() and group_by() functions to get a summary of the weight of each car (wt) grouped by cylinders (cyl), and rounded to two decimal places.

df %>%
  group_by(cyl) %>%
  summarise(mean_weight = mean(wt)) %>%
  round(digits = 2)

# A tibble: 3 × 2
    cyl mean_weight
  <dbl>       <dbl>
1     4        2.29
2     6        3.12
3     8        4

Thus, we can see that the dplyr package offers several useful functions to manage our data. Remember, that if you want any changes to be reflected in your dataframe, such as renaming a variable, remember to assign your dplyr code to your dataframe. For example, if I want the name change for wt to weight to stick, I would assign that to the dataframe like this:

df <- df %>%
  rename(weight = wt)

9.5 Exercises

As always, it’s a good idea to attempt these while the material is still fresh. You can find the answers in Appendix F.

Load the tidyverse package. Then assign the built-in dataframe starwars to an object named whatever you want. Then subset the dataframe by human species only. Save the subsetted dataframe as an object called swhuman. Then calculate and report the mean and median height in swhuman. Also report an NAs (missing data) in the height variable. Note: the units for the height variable are centimeters.
Hopefully you noticed that there are indeed some NAs in the swhuman dataframe! Detect which rows have NAs for the height variable, and write the names of the characters that have this. Next, let’s fix these errors. Perform an internet search and populate those NAs with plausible values. If you need to convert from feet to centimeters, multiply the value in feet by 30.48. If you absolutely cannot find the height of any character substitute the median height from Question 1 for their height.
Once you have filled in this missing data, calculate the new mean and median for the height variable. Comment on how much of a difference the additional values made on the mean and median compared with the values you calculated in Question 1. Then determine the three shortest characters, and three tallest characters.
Return to the larger starwars dataframe or whatever object to which you assigned it. Determine which characters have NA for height. If there are any characters with NA for height (hint: there are), enter plausible values for their heights using the approach taken in Question 2. Then report the mean and median height across everyone in this dataframe.
Still working with the starwars dataframe, convert the species variable to factor. Then, group and summarise the mean height by species, and print this in descending order. Report which species is the tallest, on average. Then, rearrange and report the species which is the shortest, on average.