The project will be due on November 22nd before midnight.
The goal of the project is to apply what you’ve learned to a realy world setting; that is, to use statistics to say something new about the real world! It will include (at least one) Python notebook and a 2–3 page report. You can work in a group of up to 3, but you will all get the same grade on the project portion. Your report should be single spaced, 12pt font with 1-inch margins. Your report will be graded out of 100 points and your grade is roughly based on the following:
Your update, to be submitted after the first mideterm, should consist of 1 or 2 paragraphs describing your plan and what you have achieved so far. At this point you should have reached step 3 described below. I expect that everyone who submits an update will recieve full credit, and the point is more for me to intervene if I am worried about directin or rate of progress.
This is only a rough rubric. Due to the variability in approaches taken and topics, it is difficult to adhere to a single rubric.
Choose a concrete topic or question that genuinely interests you. The more specific your question, the easier the next steps will be.
Your project should fall within the regression framework: there should be predictors (X) and a response variable (Y). Within this framework, there are two broad types of goals:
Examples of possible questions include:
Search for academic papers related to your topic. Aim to identify one or two papers that closely align with your interests. At this stage, don’t worry too much about technical details or whether the methods directly connect to our course. Focus on the scientific question first.
Note: Many students want to start with a dataset. I strongly encourage you to begin with literature instead. One major goal of the project is to practice reading and engaging with published research. Doing so will push you beyond the specific examples we cover in class. In contrast, students who begin with only a dataset often end up producing something that looks more like a homework exercise than a research project.
When you read the paper, one of two outcomes is likely:
It’s normal to go through the cycle (Step 1 → Step 2 → Step 3) and decide you’re not satisfied with your direction. If this happens, repeat the cycle once more. By the second round, I recommend committing to what you’ve found.
When you start your project, we won’t yet have covered all the core ideas I expect you to connect with. Ideally, your understanding of the paper should develop alongside the course. As we study topics like the bias–variance tradeoff, think about how they appear in the context of your chosen application.