Course Introduction

Pre-Class Tasks

Please watch the Course Introduction Video below. If all you see below is an empty box, you will need to allow your browser permission to load the videos. In Google Chrome, you can allow the videos by clicking on the gray and white shield with a red x on the far right of the url bar. Then, click on the “Load unsafe scripts” button. These scripts are safe. In Firefox, click on the green lock with a yellow triangle exclamation mark to the left of the url bar. Then, click the right arrow and the “Disable protection for now” button.


This course is presented in the service of a project of your choosing and will offer an intensive hands-on experience in the research process. You will develop skills in 1) generating testable hypotheses; 2) conducting a literature review; 3) understanding large data sets; 4) formatting and managing data; 5) conducting descriptive and inferential statistical analyses; and 6) presenting results to expert and novice audiences. It is designed for students who are interested in developing skills that are useful for working with data and using statistical tools to analyze them. No prior experience with data or statistics is required.

Our approach is “statistics in the service of questions.” As such, the research question that you choose (from data sets made available to you) is of paramount importance to your learning experience. It must interest you enough that you will be willing to spend many hours reading about it, thinking about it and analyzing data having to do with it.

Your work is this course will build to the completion of an individual project that will be presented at the end of the semester as a research poster and oral presentation. Several previous students have taken the opportunity to expand their research projects into a full length article that were subsequently accepted for publication in a Wesleyan student journal.

During Class Task

Since we will not be producing data for this course, the first step of your project will be to choose a data set (from those made available) that offers the opportunity to conduct research on a general topic that will be of significant interest to you.

Code Books

Before accessing any data, you will be reviewing the available codebooks (sometimes called “data dictionaries”). Codebooks commonly offer complete information regarding the data set (e.g. general topics addressed, questions and/or measurements used, and in some cases the frequency of responses or values). Reviewing a code book is always the first step in research based on existing data since 1) code books can be used to generate research questions; and 2) data is generally useless and uninterpretable without it.

The code book describes how the data are arranged in the computer file or files, what the various numbers and letters mean, and any special instructions on how to use the data properly. Like any other kind of book, some codebooks are better than others.

Your task today is to find a data set that interests you or perhaps rule out ones that do not. Some students may gravitate towards political sentiment, mental health, or science — the data sets provided in the course aim to cover a diverse set of topics. It is up to you to decide what data set (among the ones we have provided) will spark your interest the most. Looking through codebooks for the first time can be a little overwhelming — some of the codebooks are thousands of pages long! Of course, you are not expected to read every single page — but you are expected to spend time making sense of the nature of the measurements that were taken on subjects.

The codebooks are located in Resources under Data Sources and Codebooks.