Formatting + Summary Statistics
Solutions
Below, we want to accomplish the following:
Make “Packages” a section header; Make “Data” a section header
Suppress the messages from the tidyverse package output
Practice with echo + eval code chunk arguments
Bold and italicize parts of the NOTE below
Packages
We use the ## in front of the text we want to be section headers. The more # we make, the smaller the text becomes!
#| is a code chunk argument. We need to make sure that there are no spaces between # and |, and that it is on the first line of the chunk argument.
We can turn messages off by using message: false as seen above.
We can use echo: false to hide code (but it still runes in the background!)
We can use eval: false to stop the code chunk from running (but it still will show in the document)
We bold text by putting two stars on each side of what we want to bold! We italicize using one star on each side.
Data
glimpse(mtcars)
Rows: 32
Columns: 11
$ mpg <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8,…
$ cyl <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8,…
$ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 16…
$ hp <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180…
$ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92,…
$ wt <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.…
$ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18…
$ vs <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0,…
$ am <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0,…
$ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3,…
$ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2,…
Summary Statistics
Packages
mtcars
Pull up the help file for summarise using ?summarise in the Console. Read about the description, useful functions, and then scroll down to the examples. Copy the first example in the code chunk below and run it. What is is doing? Practice reading this code as a sentence!
Note There are two different pipes in R: |>
and %>%
. They have identical functionality for the scope of this course. I will be using the |>
pipe, as it has some computational benefits beyond the scope of 511.
note: we should not name column names the same names as common functions
Now, let’s go through the second example together!
mtcars |> #and then
group_by(cyl) |> #group by cyl (4,6,8)
summarise(mean_displacement = mean(disp), n_count = n()) #calculate summary statistics
# A tibble: 3 × 3
cyl mean_displacement n_count
<dbl> <dbl> <int>
1 4 105. 11
2 6 183. 7
3 8 353. 14
we can also group by more than one variable
mtcars |> #and then
group_by(cyl, vs) |> #group by cyl (4,6,8)
summarise(mean_displacement = mean(disp), n_count = n()) #calculate summary statistics
`summarise()` has grouped output by 'cyl'. You can override using the `.groups`
argument.
# A tibble: 5 × 4
# Groups: cyl [3]
cyl vs mean_displacement n_count
<dbl> <dbl> <dbl> <int>
1 4 0 120. 1
2 4 1 104. 10
3 6 0 155 3
4 6 1 205. 4
5 8 0 353. 14