Course overview

Instructor: Ethan Levien

Prerequisites: Some exposure to probability/statistics (e.g., Math 10, 20) and comfort with coding (e.g., CS 1, internships). Also, see Unit 0.

Not registered? Fill out this form.

Course objectives:

This is an introductory/intermediate statistics class focusing on regression modeling, especially linear regression. These models are the foundation of many widely used data analysis techniques, including machine learning algorithms. You will learn what is going on “under the hood” in regression models, not just how to implement them. Compared to more mathematical statistics courses, the emphasis will be on computational experiments with real and simulated data, rather than theorems and proofs.

Topics include: basic probability theory and statistical inference, single- and multiple-predictor linear regression models, model evaluation, overfitting, nonlinear models, connections between classical statistics and machine learning. The course will also emphasize the responsible and safe use of AI tools to supplement traditional coding and mathematical calculations. Applications to the social and natural sciences will be discussed.

See weekly schedule for details.

Attendance and class rules: Come to class. Please do not use phones or computers in class except during in-class problem-solving sessions. If you have an issue with this policy, please come talk to me. I may ask you to leave the room if your use of technology is distracting.

My availability:

Office hours (in person): Tuesday 8:00–10:00am and 1:30–2:30pm
Generally available to answer questions on Slack (please use Slack over email for course-related matters) throughout the week. I will occasionally answer questions on the weekend, but there is no guarantee.

Textbooks: My notes are mostly self contained, although I will reference material from a few textbooks:

James, Gareth, Witten, Daniela, Hastie, Trevor, Tibshirani, Robert, et al. (2013). An introduction to statistical learning (python version) (ISLP). Springer.
- This book straddles the boundary between machine learning and statistics and I will reference it heavily in Units 5-7. I found the discussion of regularization very approachable.
Gelman, Andrew, Hill, Jennifer, Vehtari, Aki. (2020). Regression and other stories (ROS). Cambridge University Press.
- This is a fantastic, non-technical, introduction to regression modeling through coding and examples. Many aspects of the course are inspired by this book. The code is in R, but you can find versions in Python. It has a nice introduction to causal inference and I really like how they talk about interpreting regression coefficients. I also like that they take a Bayesian view, but are not too strict about it and focus on the practical goals of regression modeling, rather than the underlying philosophy.
Evans, Michael J., Rosenthal, Jeffrey S. (2004). Probability and statistics: The science of uncertainty. Macmillan.
- This is a more mathematically technical book on statistical inference and probability theory. If you are interested in going deeper into the theory, I recommend taking a look. I will reference it heavily in Units 1-4, but only for those who are interested in going deeper.

You should be able to find PDFs of all these books online.

Software:

All coding will be done using Python in Colab Notebooks. Within Python, we will use several packages throughout the course, including:

numpy for arrays, linear algebra, and generating random numbers
pandas for working with tabular data sets
statsmodels for classical statistics

Assignments

Your grade will be based on the following. You should see Canvas for the specific grading scheme and see the linked pages for assignment details.

Exams: There will be two exams:

Midterm Quiz: In class on October 8th, covering Units 1–3.
Final: During finals week, November 25th at 3pm (room TBD) covering Units 1–6.

Project: You will complete a project as described in the project page. You may work in groups of up to 3, and all students in a group will receive the same grade. The project should adhere to the guidelines on the Canvas assignment.

Contributing to course material: You can earn extra credit by contributing to course material. Contributions can include improving the class notes, adding exercises and suggesting exam problems. All contributions must be made in the form of pull requests on github, as described on the contribution page.

Exercises: At the end of each section there are a number of exercises. You should complete all of them and ask questions if you have any. Exercises marked with a ❐ are specifically recommended for the exams.

How to be successful in this course

Come to class and ask questions.
Do every exercise in the notes and every question on the practice exams.
Use the project as an opportunity to connect the course material to topics that are meaningful to you.
Spend time reviewing the material and working through problems without LLMs/AI. These tools can create the illusion of productivity because of the rapid feedback. Ask yourself: Are the interactions I’m having with AI really serving my learning goals?
That said, you should absolutely embrace AI as a tool to help you learn, as long as you use it mindfully and not to avoid doing the work yourself. I recommend making a GitHub account and interacting with the course material through Copilot on GitHub. Here is an example of what this looks like. The problems it produced are not perfect (for example, I find the last one a bit trivial because the flips are independent), but they are a great starting point.
Connect with other students in the class and work with them. The class is typically very diverse: some students have taken other probability courses, like 20 and 40, while others have experience doing data analysis in a research or industry setting. It’s great if you can connect with people whose skills complement yours.

Accessibility Needs

Students with disabilities who may need disability-related academic adjustments and services for this course are encouraged to see me privately as early in the term as possible. Students requiring disability-related academic adjustments and services must consult the Student Accessibility Services office (Carson Hall, Suite 125, 646-9900). Once SAS has authorized services, students must show the originally signed SAS Services and Consent Form and/or a letter on SAS letterhead to me.

As a first step, if you have questions about whether you qualify to receive academic adjustments and services, please contact the SAS office. All inquiries and discussions will remain confidential.

Math 50 Linear regression modeling

Course overview

Assignments

How to be successful in this course

Accessibility Needs