CS109 Data Science

This Harvard course in data science introduces methodologies for building and using databases with Python and a number of other tools. It is a full 13-week college course, and students who complete it will be primed to pursue greater knowledge and expertise in the field of data science.

Created by: Joseph K. Blitzstein

Produced in 2015

icon
Course Description

Learning from data in order to gain useful predictions and insights. This course introduces methods for five key facets of an investigation: data wrangling, cleaning, and sampling to get a suitable data set; data management to be able to access big data quickly and reliably; exploratory data analysis to generate hypotheses and intuition; prediction based on statistical methods such as regression and classification; and communication of results through visualization, stories, and interpretable summaries.

icon
Instructor Details

Joseph K. Blitzstein

Joseph K. Blitzstein is an Assistant Professor of Statistics at Harvard University. He finished his Ph.D. work at Stanford University in 2006, advised by the inimitable Persi Diaconis.Joseph's research is a mixture of statistics, probability, and combinatorics. He is especially interested in graphical models, complex networks, and Monte Carlo algorithms (including both sequential importance sampling and Markov chains).

icon
Reviews

4.9

7 total reviews

5 star 4 star 3 star 2 star 1 star
% Complete
% Complete
% Complete
% Complete
% Complete

I would recommend this course to everyone who is looking for a course that covers a wide range of very ‘hot’ topics: collection and preparation of data, Machine learning (Regression, SVM, Trees, Bayesian approach, Clustering, Random forests, PCA etc.), analysis of networks and visualisation. It does not go deep into details of all the algorithms but provides a lot of practical knowledge to get started.

By Brendan Martin on 3/26/2019

With a great mix of theory and application, this course from Harvard is one of the best for getting started as a beginner. It’s not on an interactive platform, like Coursera or edX, and doesn’t offer any sort of certification, but it’s definitely worth your time and it’s totally free.

By kptech333 on 7/30/2017

I'd recommend a brilliant, free online Python/Data Science course in the form of Harvard's CS109. Check out the syllabus, it's pretty darn thorough in terms of application & theory of the entire data science process(import, wrangling, exploration, prediction, and communication). Big plus, it uses Python and relevant statistical/machine learning concepts.

By sonaut on 10/17/2015

Harvard's CS109 course is quite good and uses Python. It covers quite a bit of data cleaning and the data science part of things.

By Abhishek Kedia on 12/30/2014

I have some experience as a Business Analyst so it did refresh some basic concepts. But I did not learn anything new. I have been doing the MOOC for the past 2 weeks now and kudos to Prof for designing such a wonderful resource and more importantly making it available free of cost. I would really urge you to try the assignments. They are amazing. I have completed 2/3 of the first one. I am already feeling good with functional and basic Python. Also just watching the lectures will never help. Especially graduate level lectures. It never does. These are just guidelines to help you along the way. Google and Stackoverflow will be your best friends if you are truly to appreciate these lectures. Go ahead and Google even the jokes or other references you don't get. It will pay off getting in the lecturers head.

By Sooraj Raveendran on 7/28/2017

This is the flagship course of the program that covers almost the entire breadth of the practice of data science in a very pragmatic way. It is hard to say which part of this course was the best - the lectures, the labs, the homework assignments - everything was just awesome. The workload was very high, though.

By William Chen on 7/12/2018

I think the true highlight of the course are the various Labs (aka homework assignments) in the course, that walk through exercises in statistical learning through Jupyter notebooks.