Connecting Packaged R Code with Concepts + Intro to regression

Lecture: Not Sure

Dr. Elijah Meyer

NC State University
ST 511 - Fall 2024

2024-10-30

Download today’s AE oct-30 from Moodle. We are going to learn a new distribution!

…not really, but you still need to download the AE for today

Checklist

– Keep up with Slack

– Upload today’s AE. We will be using this to start class

– Homework 3 (including exam corrections) due Sunday (11:59pm)

– Quiz Wednesday (due Sunday)

– Statistics experience (released; due end of semester on Gradescope)

– Optional assignment (released; due November 1st on Gradescope)

Learning objectives

– Review some common “pre-packaged” functions you will commonly see used to analyze data

– Understand what the output means / describe the output

– What is regression?

– Understand how to summarize two quantitative variables

The AE

Before we get into the AE, I want to make it clear that this AE is designed for us to explore and understand common data analysis functions in R, and make connections to what we’ve learned in class.

A rigorous analysis would include:

– Exploratory data analysis

– Checking assumptions

– Writing decisions + conclusions

– ect.

I want you all to be familiar with these functions in R, just in case you come across/need them in your future work.

AE 10-30

t-test

syntax for t.test

t.test(y ~ x = NULL, alternative = c("two.sided", "less", "greater"), mu = 0, paired = FALSE, var.equal = FALSE, conf.level = 0.95, …)

Can just use y for single mean test!

t-test

Assumptions

– Independence

– Normality

– Note: We can also check for constant variance. If we find this to be true, we can use a different type of SE.

t-test

One-sided confidence interval

You do not need to know this for this course.

“In a one-sided confidence interval, we’re trying to find a single number a such that we’re 95% confident that the true mean is greater than a (or less than a if you set one.sided=”less”)“.

t-test

\[ t = \frac{\bar{x_1} - \bar{x_2} - 0}{\sqrt{\frac{s_1^2}{n1} + \frac{s_2^2}{n2}}} \]

t-test

\[ t = \frac{5.78 - 9.22 - 0}{\sqrt{\frac{2.88^2}{10} + \frac{3.48^2}{10}}} = -2.41 \]

Anova

Assumptions

– Independence

– Normality

– Constant Variance

Anova

syntax: y ~ x, data =

# Compute the analysis of variance
res.aov <- aov(y ~ x, data = )
# Summary of the analysis
summary(res.aov)

Anova

syntax: y ~ x, data =

# Compute the analysis of variance
res.aov <- aov(response ~ trt, data = cholesterol)
# Summary of the analysis
summary(res.aov)

Chi-sq

Assumptions

– Independence

– Expected counts larger than 5

Chi-sq

syntax: chisq_test(data, y ~ x)

Chi-sq

syntax: chisq_test(survey, W.Hnd ~ Clap)