Math 50 Linear regression modeling

Exam study guide

Midterm

The midterm will cover Units 1, 2 and 3. Here are some things you absoutely need to know

Overview

  • The only distributions you need to know about are Bernoulli, Binomial (don’t need to know binomial formula) Uniform and Normal
  • How to calculate conditional probability, marginal probability, expectations given a probability distribtion. Same with expectations/conditional expectations. Problems here you have to calculate these things by hand will involve discrete random variables.
  • Understand what the LLN and CLT are saying and their significance for statistical inference.
  • Be able to work with Normal random variables (in the sense of adding them ad estimating probabilities given a table or the bell curve)
  • Basic concepts of statistical inference: Sample distribution, bias, consistency. Be able to tell if a (simple) estimator is biased.
  • Basics ideas of regression modeling: What are the assumptions assumptions? Relationship between covariance, correlations, R-squared. Derive formula for these.

Python

In terms of Python, you need to know enough syntax to identify what a line or two of codes does. I will NOT ask about plotting syntax or constructing dataframes (you may be asked to compute something given a data frame). More specifically:

  • Basic logic: You should know the syntax for if statements and for loops; however the questions on the midterm will mostly focus on things directly related to statistics/probability.
  • How to generate random numbers using np.random.choice and np.random.normal. If I give you a probability model in the form such as \begin{align} X \sim {\rm SomeDistribution}\quad Y | X &\sim {\rm SomeDistribution} \end{align} you should be able to write a function which returns samples from this distribution (as numpy arrays). It’s a good idea to look at every probability model in the exercises and try to impelement in Python.
  • How to index 1D arrays, compute their lengths (I will not ask about multidimensional arrays). This will mostly be relavent in the context of estimating conditional probabilities: For example, if you are given a dataframe df and tell you the rows are samples from some probability model, I might ask to estimate a conditional or/and marginal probability.
  • Describe how to do something in Python but not write out the exact code. For example, I might ask you to describe how you would check whether a given formula (say the variance of some sample distributio) is correct, and the anwer can be in words: “I would first generate a function which returns the samples as a dataframe, then run this function for different $N$, etc…”

Practice problems

Final

The final is cummulative and includes everyhing on the midterm, plus Units 4-7 (ending with whatever material we cover on Friday Nov 7.)

You will need to:

  • Given a linear regression model, state the interpretation of the regession coefficients.
  • Be able to calculate the marginal variance (if the predictor distribution is given)
  • Know the effects of adding a predictor to the model. For example, be able to predict how it will change another regression coefficient given the correlation with existing predictors.
  • Know what Simpson’s paradox is and when it occurs
  • Understand what happens when we flip the axes in a regression model
  • Interpret the output of a fitted model: Which effects are significant, what does $R^2$ mean.
  • Know how to include catagorical variables in a linear regression model and their inerpretation
  • Calculate regularized estimators (for example for the sample mean where this can be done by hand)

There may be some extra credit questions from Unit 7, depending on how much time we have at the end of the term.

Practice exam

  • Final from 2024 (ignore the red text in the instructions. Do all problems).