– HW 6 has been released (due Tue 26th at 11:59; our last HW!)
– Quiz 11 (released Wednesday; due Sunday Nov 24; our last Quiz!)
– Don’t forget about the statistics experience (Due Dec 6th)
Announcements
– You are allowed one front+back note sheet on the final exam
> It must be hand-written
> I will also provide you a formula sheet + get it posted to our website ~ 1 week (if not sooner) before the final exam
– I suggest writing things such as..
> General interpretations (slope coefficient; p-value; etc)
> "Conversational" questions to help discover answers (what is my explanatory variable; is it categorical or quantitative; if categorical, how many levels?)
Announcements
Exam-1 in class corrections should be in by tonight
> You only see on Gradescope of correct vs incorrect, but no points assigned
> I didn't want those points to be assigned to HW-3, and am adding them to your in-class exam
At the end of last class, we predicted the probability of a spam email when the number of exclamation points is equal to 10. How does that change our model below?
We can also use logistic regression for classification! That is, we can set a threshold to classify new observations as a success (spam) or failure (no spam)
Classification
Classification
Suppose you are a data scientist working on a spam filter. You must determine how high the predicted probability must be before you think it would be reasonable to call it spam and put it in the junk folder (which the user is unlikely to check).
What are some tradeoffs you would consider as you set the decision-making threshold? Discuss with your neighbor.
Classification
Takeaways
– We use logistic regression with categorical (binary) response variable
– Use the logit link function to restrict probabilities to be on the appropriate scale [0,1]
– Can use logistic regression to estimate probabilities, calculate odds ratios, and set up a classification model!
Regression discussion
Regression
– Simple linear regression (SLR)
– Multiple linear regression (MLR)
– Logistic regression
When does it make sense to use linear regression?
We always check for independence (in both logistic and linear regression).
For linear regression, we can ask ourselves, is there evidence of a linear relationship between x and y?
Linear Model
Are we justified to fit a linear relationship?
Linear Model
Are we justified to fit a linear relationship?
Extension: (will not be tested on)
We can make residual vs fitted plots in linear regression, and look for a trend. If we see a random scatter, linearity is not violated.
Extension: (will not be tested on)
If we see a trend, it is evidence to suggest that linearity is violated
Outliers
We also need to check for outliers. Outliers can influence the coefficients of regression models (both linear and logistic).