Testing mtcars (simulation)

This code can be used to simulate the sampling distribution under the assumption of the null hypothesis. The takeaway here is that we need to shift our original data by the difference in our sample mean to our null value, so that when we resample with replacement, we get a sampling distribution centered at the null value. See slides for written out steps!

Packages

Warning: package 'ggplot2' was built under R version 4.3.3
Warning: package 'tidyr' was built under R version 4.3.3
Warning: package 'readr' was built under R version 4.3.3
Warning: package 'dplyr' was built under R version 4.3.3
Warning: package 'stringr' was built under R version 4.3.3
Warning: package 'lubridate' was built under R version 4.3.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4          ✔ readr     2.1.5     
✔ forcats   1.0.0          ✔ stringr   1.5.1     
✔ ggplot2   3.5.1          ✔ tibble    3.2.1     
✔ lubridate 1.9.3          ✔ tidyr     1.3.1     
✔ purrr     1.0.2.9000     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Warning: package 'tidymodels' was built under R version 4.3.3
── Attaching packages ────────────────────────────────────── tidymodels 1.2.0 ──
✔ broom        1.0.5      ✔ rsample      1.2.1 
✔ dials        1.2.1      ✔ tune         1.2.0 
✔ infer        1.0.7      ✔ workflows    1.1.4 
✔ modeldata    1.3.0      ✔ workflowsets 1.1.0 
✔ parsnip      1.2.1      ✔ yardstick    1.3.1 
✔ recipes      1.0.10     
Warning: package 'dials' was built under R version 4.3.3
Warning: package 'scales' was built under R version 4.3.3
Warning: package 'infer' was built under R version 4.3.3
Warning: package 'modeldata' was built under R version 4.3.3
Warning: package 'parsnip' was built under R version 4.3.3
Warning: package 'recipes' was built under R version 4.3.3
Warning: package 'rsample' was built under R version 4.3.3
Warning: package 'tune' was built under R version 4.3.3
Warning: package 'workflows' was built under R version 4.3.3
Warning: package 'workflowsets' was built under R version 4.3.3
Warning: package 'yardstick' was built under R version 4.3.3
── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
✖ scales::discard() masks purrr::discard()
✖ dplyr::filter()   masks stats::filter()
✖ recipes::fixed()  masks stringr::fixed()
✖ dplyr::lag()      masks stats::lag()
✖ yardstick::spec() masks readr::spec()
✖ recipes::step()   masks stats::step()
• Use suppressPackageStartupMessages() to eliminate package startup messages

The Data

glimpse(mtcars)
Rows: 32
Columns: 11
$ mpg  <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8,…
$ cyl  <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8,…
$ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 16…
$ hp   <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180…
$ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92,…
$ wt   <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.…
$ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18…
$ vs   <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0,…
$ am   <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0,…
$ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3,…
$ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2,…

Null and Alternative

From class context:

\(\H_o: \mu = 12\)

\(\H_a: \mu > 12\)

Sample Mean

xbar <- mtcars |>
  summarize(stat = mean(mpg))

xbar
      stat
1 20.09062

The Difference

xbar - 12
      stat
1 8.090625

The difference is 8.09

Shift MPG by diff

mtcars <- mtcars |>
  mutate(mpg_shift = mpg - 8.09)

The Process

set.seed(12345)

null_df <- mtcars |>
  specify(response = mpg_shift) |>
  generate(reps = 10000, type = "bootstrap") |>
  calculate(stat = "mean")

The Graph

null_df |>
  ggplot(
    aes(x = stat)
  ) + 
  geom_density(fill = "gray") +
  labs(title = "Bootstrap distribution",
       y = "")

The Calculation

null_df |>
  get_p_value(xbar, "right")
Warning: Please be cautious in reporting a p-value of 0. This result is an approximation
based on the number of `reps` chosen in the `generate()` step.
ℹ See `get_p_value()` (`?infer::get_p_value()`) for more information.
# A tibble: 1 × 1
  p_value
    <dbl>
1       0

Our p-value is < 0.001