Welcome to CS109a/STAT121a/AC209a, also offered by the DCE as CSCI E-109A, Introduction to Data Science. This course is the first half of a one‐year introduction to data science. The course focuses on the analysis of messy, real life data to perform predictions using statistical and machine learning methods.

The material of the course is divided 3 modules. Each module will integrate the five key facets of an investigation using data:

  1. data collection ‐ data wrangling, cleaning, and sampling to get a suitable data set;
  2. data management ‐ accessing data quickly and reliably;
  3. exploratory data analysis – generating hypotheses and building intuition;
  4. prediction or statistical learning; and
  5. communication – summarizing results through visualization, stories, and interpretable summaries.

Only one of CS 109a, AC 209a, or Stat 121a can be taken for credit. Students who have previously taken CS 109, AC 209, or Stat 121 cannot take CS 109a, AC 209a, or Stat 121a for credit.

Course Logistics

Prerequisites

You are expected to have programming experience at the level of CS 50 or above, and statistics knowledge at the level of Stat 100 or above (Stat 110 recommended). HW0 is designed to test your knowledge on the prerequisites. Successful completion of this assignment will show that this course is suitable for you. HW0 will not be graded but you are required to submit.

Course Components

Lectures

The class consists of two weekly lectures and one lab, which is designed as a class activity. Attendance to lectures is mandatory. They are held Mon and Wed 1:00pm ‐ 2:30 pm in Northwest Building (NW), Lecture Hall B-103. We will have in class quizzes to assess your understanding of the material and to help us identify gaps.

Labs

Attendance to labs is optional but strongly encouraged. Labs are designed as hands-on in-class activities.. The instructor will go over practice problems similar to the homework problems and review difficult material. Two lab sessions with identical content are held Thur 4:00-5:30 pm and Fri 10:00-11:30 am in NW Basement Lobby. You should plan to attend one of the two.

Sections

Lectures and labs are supplemented by 1 hour sections led by teaching fellows. There are two types of sections:

a) Standard Sections: which will be a mix of review of material and practice problems similar to the HW

b) Advanced Sections which will cover advanced topics like the mathematical underpinnings of the methods seen in lecture and lab and extensions of those methods. The material covered in the Advanced Sections is required for all AC 209A students.

Instructor Office Hours

Pavlos: Tuesday 3:00-5:00 pm, IACS student lobby MD ground floor Kevin: Tuesday 1:00-3:00 pm, IACS student lobby MD ground floor Rahul: Wednesday 2.30-4:00pm, IACS student lobby MD ground floor Margo: Monday 2.30-4.00pm, IACS student lobby MD ground floor

Assignments

There will be an initial self-assessment homework called HW0 and 8 more graded homework assignments. Some of them will be due in a week and some of them in two weeks. You will be working in Jupyter Notebooks which you can run in your own environment or in the SEAS JupyterHub cloud (accessed from Canvas).

Quizzes

Quizzes will be taken at the end of class and the material will be based on what was discussed in lecture. 40% of the quizzes will be dropped from your grade.

Midterm

There will be one midterm (take-home) to be done individually (see Calendar for dates)

Final

There will be a final group exam (2-4 students) due during Exam period. More details to come in November. See Calendar for specific dates.

Recording

Lectures will be recorded and made available real time for DCE students and 24 hours later for in-campus students via Canvas. Labs will also be videotaped only for distant students.

An Introduction to Statistical Learning by James, Witten, Hastie, Tibshirani.

The book is available here:

Free electronic version: http://www-bcf.usc.edu/~gareth/ISL/

HOLLIS: http://link.springer.com.ezp-prod1.hul.harvard.edu/book/10.1007%2F978-1-4614-7138-7

Amazon: https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370

Course Policies

Getting Help

For questions about homework, course content, package installation, JupyterHub, and after you have tried to troubleshoot yourselves, the process to get help is:

(1). Post the question in Piazza and hopefully your peers will answer. Note that in Piazza questions are visible to everyone. The TFs monitor the posts but will respond no earlier than 24 hours from the posting time. (2). Go to Office Hours, this is the best way to get help.

For private issues

(3). If none of the above works send an email to the Helpline: cs109a2017@gmail.com. The Helpline is monitored by all the TFs. (4). For private matters send an email to the instructors.

Questions on Graded Homework and Regrading Policy

We take great care in making sure all homework are graded properly. However if you feel that your assignment was not fairly graded you may:

  1. Contact the grader by emailing the helpline with subject line “Regrade HW1: Grader=johnSeitz” within 3 days.
  2. If still unhappy with the initial response, then submit a reason via email to the Helpline with subject line “Regrade HW1: Second request” within 3 days of receiving the initial response. Note: once regrading is done, you may receive a grade that is higher or lower than the initial grade.

Late Day Policy

You are allowed up to 6 days of late homework submissions, maximum of 2 days on any single assignment, no questions asked. No homework will be submitted more than 48 hours late. Solutions will be posted two days after the due date. Any other late homework submissions will not be accepted without a written note from UHS or your resident dean’s office. If you exceed your 6 late days, 1 point will be deducted for late days after that.

Late minutes count as a whole day, e.g. if you submit one hour late, this will count as a 1 day.

Communication from Staff to Students

Class announcements and official communication from staff will be through Canvas. All homework and quizzes will be posted and submitted in Canvas. Also all feedback forms.

MAKE SURE you have your settings set so you can receive emails from Canvas. No official communication or announcements will be done via Piazza.

Submitting an assignment

You are to work all homework in the Jupyter Notebook. When you are done, convert your notebook in a pdf (by using the browser print function) and submit both the .ipynb file and the .pdf file. You can submit multiple times up to the deadline.

You are encouraged but not required to submit in pairs. We will be using the Groups function in Canvas to do this, details to be announced later.

All assignments will be posted on Wed. at 6pm and will due on next week’s Wed. at 11.59pm in Canvas.

Grading Score

Your final score for the course will be computed using the following weights:

  • Homework 40%
  • Quizzes/Readings 10%
  • Midterm 30%
  • Project 20%

Total 100%

Grading Guidelines

Homework will be graded based on

(1) how correct your code is (the Notebook cells should run, we are not troubleshooting code), (2) how you have interpreted the results - we want text not just code, it should be a report, and (3) how well you present the results. The scale is 1-5, with ½ extra point up to 5.5

Software

We will be using Jupyter Notebooks, Python 3 and various python modules. You can access the notebook viewer either in your own machine by installing the Anaconda platform (Links to an external site) which includes IPython as well all packages that will be required for the course, or by using the SEAS Jupyter Hub from Canvas. Details in class.

Accommodations for students with disabilities

Students needing academic adjustments or accommodations because of a documented disability must present their Faculty Letter from the Accessible Education Office (AEO) and speak with Kevin by the end of the third week of the term: Friday, September 15. Failure to do so may result in us being unable to respond in a timely manner. All discussions will remain confidential.