CS109 Data Science

Data Science

Learning from data in order to gain useful predictions and insights. This course introduces methods for five key facets of an investigation: data wrangling, cleaning, and sampling to get a suitable data set; data management to be able to access big data quickly and reliably; exploratory data analysis to generate hypotheses and intuition; prediction based on statistical methods such as regression and classification; and communication of results through visualization, stories, and interpretable summaries.

We will be using Python for all programming assignments and projects. All lectures will be posted here and should be available 24 hours after meeting time.

The course is also listed as AC209, STAT121, and E-109.

Important Links

Lecture videos
Blackboard
Lecture slides available on Schedule
Staff mailing list: staff@cs109.org
Office hours can be found on Piazza or this Google Calendar.
Links to the GitHub repositories: CS109 2014 course material and CS 109 2014 data

Lectures and Labs

Lectures are 2:30-4pm on Tuesdays & Thursdays in Northwest B103
Labs are 10am-12pm on Fridays, Room: Geological Museum 100

Instructors

Rafael Irizarry, Biostatistics
Verena Kaynig-Fittkau, Computer Science

Guest Lecturer

Marc Streit

Staff

Stephanie Hicks, Head TF
Mingxiang Teng
Michael Packer
Marcus Way
Michael Lackner
Amy Mir
Tarik Adnan Moon
Olivia Angiuli
Yang Li
Huihui Fan
Antonia Oprescu
Claudio Rosenberg
Tudor Giurgica-Tiron
Zhijie Zhou
Nural Zaman
Brian Feeny
Joy Ming
Rick Lee
Felix Gonda
Korey Tucker
Lane Erickson
Diana Miao
Logan Kerr
Stephen Klosterman
Jacob Dorabialski

Important Links

Lectures and Labs

Instructors

Guest Lecturer

Staff

Material from CS 109 taught in Fall 2013