Exploratory Data Analysis

Created by Johns Hopkins University, this Coursera course gets into the nuts and bolts of summarizing data. It covers varying techniques that help with modeling and statistical exploration. Its part of a program aimed at fostering expertise in data science.

Created by: Roger D. Peng

Produced in 2016

Quality Score

Content Quality
Video Quality
Qualified Instructor
Course Pace
Course Depth & Coverage

Overall Score : 98 / 100

Live Chat with CourseDuck's Co-Founder for Help

Need help deciding on a redux course? Or looking for more detail on Roger D. Peng's Exploratory Data Analysis? Feel free to chat below
Join CourseDuck's Online Learning Discord Community

Course Description

redux Awards Best Free Course

This course covers the essential exploratory techniques for summarizing data. These techniques are typically applied before formal modeling commences and can help inform the development of more complex statistical models. Exploratory techniques are also important for eliminating or sharpening potential hypotheses about the world that can be addressed by the data. We will cover in detail the plotting systems in R as well as some of the basic principles of constructing data graphics. We will also cover some of the common multivariate statistical techniques used to visualize high-dimensional data.



    • Emphasis on graphical analysis creates a strong point that many other courses overlook or minimize.
    • Covers a wide range of analytic techniques.
    • Course deeply covers R, which is vital to analytic presentation.
    • Course doesn’t really get out of graphics systems, severely limiting the amount of exploratory data analysis that really happens.
    • Theory is lacking in this course.
    • Course outlines what data analysis consists of more than how to perform exploratory analysis.

Instructor Details

Roger D. Peng

Roger D. Peng is a Professor of statistics at the Johns Hopkins Bloomberg School of Public Health and a Co-Editor of the Simply Statistics blog. He received his Ph.D. in Statistics from the University of California, Los Angeles and is a prominent researcher in the areas of air pollution and health risk assessment and statistical methods for environmental data. He is the recipient of the 2016 Mortimer Spiegelman Award from the American Public Health Association, which honors a statistician who has made outstanding contributions to health statistics. He created the course Statistical Programming at Johns Hopkins as a way to introduce students to the computational tools for data analysis. Dr. Peng is also a national leader in the area of methods and standards for reproducible research and is the Reproducible Research editor for the journal statistics. His research is highly interdisciplinary and his work has been published in major substantive and statistical journals, including the Journal of the American Medical Association and the Journal of the Royal Statistical Society. Dr. Peng is the author of more than a dozen software packages implementing statistical methods for environmental studies, methods for reproducible research, and data distribution tools. He has also given workshops, tutorials, and short courses in statistical computing and data analysis.



340 total reviews

5 star 4 star 3 star 2 star 1 star
% Complete
% Complete
% Complete
% Complete
% Complete

By Rafael L G on 16-Oct-18

This has been a challenging course for me, for whatever reasons. I have devoted a great deal of time in reading Dr. Peng's books as well as reviewing work product of other students to get a better grasp of the logic and methodology. I have enjoyed this course more than any of the preceding courses. And, the struggle I believe will be worth the effort and facilitate my completion of the data science specialization program.

By Fernando R A on 4-Feb-19

This lesson could have been significantly improved if there was at least one assignment on clustering/dimensional reduction. Those are probably the hardest concepts thought thus far, so it would have been extremely useful to have at least one challenge to work through.

By Nils N on 11-Jul-18

Once it got to the clustering section the lessons were inscrutable. Extremely difficult to understand and not explained well.

By Abhinav K on 12-Mar-19

This course covers plotting (base, lattice, ggplot) then takes a confusing tour into heavy topics of clustering and dimension reduction, then flips back to coloring in charts. The order of the lectures is confusing and PCA/SVD needs more background, clearer explanation and treatment (gets covered a bit more later under regression). Assignments are good and swirl courses helped solidify the lectures.

By Pratik P on 15-May-19

This course is basically plotting with R and clustering/dimensionality reduction. There's is not enough emphasis on the later in my opinion. The final assignment focuses only on plotting, which is a shame.

By Guillermo S R P on 12-Feb-18

This is the worst of the Data Science courses so far (they've all been pretty good up to this point).It's called Exploratory Data Analysis, but is actually all about the graphics systems in R. And it does a botched job on those as well.All quizzes and assignments are about the graphics systems. The only portion of the course that deviates from that is Week 3 (for which there is no quiz or project) where we "learn" about clustering and dimension reduction. However, that material is presented really poorly: not enough depth for someone who is already familiar with the subject matter; and not nearly well enough explained for newbies.On the graphics side, none of the systems is explored in great depth. The lattice system is essentially just mentioned in passing. To cap it all off, the brief for the last assignment is really ambiguous, which often causes perfectly valid work to be graded poorly by peers. (Just look at the forums, if you need proof.)

By Sachin R on 30-Aug-18

Cons:# Too much focus on hopelessly outdated R functions.# Lectures are mostly powerpoint karaoke along the lines of "You can do that thing. And you can also do that other thing. And also you do this third thing" without much real-world application.# ggplot2 is the only modern viz package that gets mentioned Pros:# The swirl exercises are great (but very buggy on Mac)

By swetha on 13-May-19

Provides a solid overview of the base plotting system and a discussion (better elsewhere) of others. Introduces some higher level exploratory methods, without much information on either the theory or application (simply walks through the recipe). Assessments do not match the lecture material, so the credential is essentially meaningless. Read the associated book, watch the video lectures if you'd like. Don't bother with paying for the certificate.

By Silvia Y B V on 10-May-16

This course mostly about how to use plotting libraries in R.

By Aboobacker on 10-Jun-17

The videos were merely repeating the content from swirl, with absolutely no added values.

By Rahul P on 20-Sep-16

When it comes down to it, there's simply not the support to assist a student that has a really hard problem, "hacker mentality" seems to equate to "figure it out on your own cuz nobody's going to help you". If things do not work perfectly for you then you are likely never to be able to finish because your "peers" don't know any better either. The way this class is set up makes me angry every time I have to deal with it. I would probably be just as well served doing just the swirl() exercises. I would quit if I hadn't paid all the way through in advance. I can't believe this is the type of school John Hopkins is to produce a course of this quality, but I guess I have to.

By Eliomar F on 16-Jan-19

really interesting