Learning from data in order to gain useful predictions and insights. This course introduces methods for five key facets of an investigation: data wrangling, cleaning, and sampling to get a suitable data set; data management to be able to access big data quickly and reliably; exploratory data analysis to generate hypotheses and intuition; prediction based on statistical methods such as regression and classification; and communication of results through visualization, stories, and interpretable summaries.
We will be using Python for all programming assignments and projects. All lectures will be posted here and should be available 24 hours after meeting time.
The course is also listed as AC209, STAT121, and E-109.
Important Links
-
Lecture slides available on Schedule
-
Staff mailing list: staff@cs109.org
-
Office hours can be found on Piazza or this Google Calendar.
-
Links to the GitHub repositories: CS109 2014 course material and CS 109 2014 data
Lectures and Labs
- Lectures are 2:30-4pm on Tuesdays & Thursdays in Northwest B103
- Labs are 10am-12pm on Fridays, Room: Geological Museum 100
Instructors
- Rafael Irizarry, Biostatistics
- Verena Kaynig-Fittkau, Computer Science
Guest Lecturer
- Marc Streit
Staff
- Stephanie Hicks, Head TF
- Mingxiang Teng
- Michael Packer
- Marcus Way
- Michael Lackner
- Amy Mir
- Tarik Adnan Moon
- Olivia Angiuli
- Yang Li
- Huihui Fan
- Antonia Oprescu
- Claudio Rosenberg
- Tudor Giurgica-Tiron
- Zhijie Zhou
- Nural Zaman
- Brian Feeny
- Joy Ming
- Rick Lee
- Felix Gonda
- Korey Tucker
- Lane Erickson
- Diana Miao
- Logan Kerr
- Stephen Klosterman
- Jacob Dorabialski