PH125.2x: Data Science: Visualization - Course Syllabus
Course Instructor
Rafael Irizarry
Course Description
In this second course of nine in the HarvardX Data Science Professional Certificate, we learn the basics of data visualization and exploratory data analysis.
The growing availability of informative datasets and software tools has led to increased reliance on data visualizations across many industries, academia, and government. Data visualization provides a powerful way to communicate data-driven findings, motivate analyses, or detect flaws.
In this course, you will learn the basics of data visualization and exploratory data analysis. We will use three motivating examples and ggplot2, a data visualization package for the statistical programming language R, to code. To learn the very basics, we will start with a somewhat artificial example: heights reported by students. Then we will use two case studies related to world health and economics and another in infectious disease trends in the United States.
It is also important to note that mistakes, biases, systematic errors, and other unexpected problems often lead to data that should be handled with care. The fact that it can be difficult or impossible to notice an error just from the reported results makes data visualization particularly important. This course will explore how failure to discover these problems often leads to flawed analyses and false discoveries.
HarvardX has partnered with DataCamp for all assignments in R that allow students to program directly in a browser-based interface. You will not need to download any special software, but an up-to-date browser is recommended.
What you'll learn:
- data visualization principles to better communicate data-driven findings
- how to use ggplot2 to create custom plots
- the weaknesses of several widely used plots and why you should avoid them
New to EdX?
Are you new to edX? Check out edx's Demo Course!
Need help? Visit edX Support via the Support tab or visit the Help Center.
Course Structure
When you join the course, we encourage you to meet your peers, learn the DataCamp platform, and tell us about yourselves and what you hope to get out of the course! You can progress through the material at your own pace.
Grading
All graded components of the course are DataCamp assignments: The eleven programming exercises are worth 100% of your grade.
All other components of the course, such as the the discussion boards, are not for credit.
Certification
In order to receive a Verified Certificate, you must sign up and pay for a Verified Certificate by the deadline on the course page and earn a passing grade of at least 70%.
COURSE OUTLINE
Section 1: Introduction to Data Visualization and Distributions
You will get started with data visualization and distributions in R.
Section 2: Introduction to ggplot2
You will learn how to use ggplot2 to create plots.
Section 3: Summarizing with dplyr
You will learn how to summarize data using dplyr.
Section 4: Gapminder
You will see examples of ggplot2 and dplyr in action with the Gapminder dataset.
Section 5: Data Visualization Principles
You will learn general principles to guide you in developing effective data visualizations.
FAQS
What is the deadline to sign up for a Verified Certificate?
The deadline is listed on the right side of the course landing page.
How do I earn a certificate?
To earn a certificate, you must sign up for a Verified Certificate by the deadline and earn a grade of at least 70%. When you achieve this score, a view your certificate button will appear on your dashboard. For more information, click on this link.
How do I upgrade to a verified certificate?
Go to your edX Dashboard (by clicking the edX icon at the top left of this page). Under this course, click the "Challenge Yourself!" link.
How long does the course take?
That is up to you! It is 4 weeks of content. Just be aware that you must complete the course by the deadline listed on your course homepage.
I am doing well on the assessments, but when I look under "Progress" I have a very low grade...why?
The grade is calculated based on all of the assessments you have completed and the assessments that you have not completed (edX says you have a "zero" on those assessments until you have attempted them). You will see your overall grade move up as you progress through the course.
Installing R
Once you decide to install R, you can download it freely from the Comprehensive R Archive Network (CRAN). It is relatively straightforward, but if you need further help you can try the following resources:
Research
HarvardX pursues the science of learning. When you participate in this course, you will also participate in research about learning. Read our research statement to learn more.