Skip to main content

PH125.3x: Data Science: Probability - Course Syllabus

Course Instructor

Rafael Irizarry

Course Description

In this third course of nine in the HarvardX Data Science Professional Certificate, we learn the basics of probability theory.

Probability theory is the mathematical foundation of statistical inference which is indispensable for analyzing data affected by chance, and thus essential for data scientists. 

In this course, you will learn important concepts in probability theory. The motivation for this course is the circumstances surrounding the financial crisis of 2007-2008. Part of what caused this financial crisis was that the risk of certain securities sold by financial institutions was underestimated. To begin to understand this very complicated event, we need to understand the basics of probability. 

We will introduce important concepts such as random variables, independence, Monte Carlo simulations, expected values, standard errors, and the Central Limit Theorem. These statistical concepts are fundamental to conducting statistical tests on data and understanding whether the data you are analyzing are likely occurring due to an experimental method or to chance.

Note that statistical inference, covered in the next course in this series, builds upon probability theory.

HarvardX has partnered with DataCamp for assignments in R that allow students to program directly in a browser-based interface. You will not need to download any special software, but an up-to-date browser is recommended.

What you'll learn:

    • Important concepts in probability theory including random variables and independence
    • How to perform a Monte Carlo simulation
    • The meaning of expected values and standard errors and how to compute them in R
    • The importance of the Central Limit Theorem

New to EdX?

Are you new to edX? Check out edx's Demo Course!

Need help? Visit edX Support via the Support tab or visit the Help Center.

Course Structure

When you join the course, we encourage you to meet your peers, learn the DataCamp platform, and tell us about yourselves and what you hope to get out of the course! You can progress through the material at your own pace.

Grading

All graded components of the course are DataCamp assignments: The seven programming exercises are worth 100% of your grade.

All other components of the course, such as the the discussion boards, are not for credit.

Certification

In order to receive a Verified Certificate, you must sign up and pay for a Verified Certificate by the deadline on the course page and earn a passing grade of at least 70%.

COURSE OUTLINE

Important note: The first two parts of the first section of content (Section 1: Introduction to Discrete Probability and Section 1: Combinations and Permutations) are currently available. The rest of the course content will be released on 4/5/2018. Thank you for your patience!

Section 1: Discrete Probability

You will learn about basic principles of probability related to categorical data using card games as examples.

Section 2: Continuous Probability

You will learn about basic principles of probability related to numeric and continuous data.

Section 3: Random Variables, Sampling Models, and the Central Limit Theorem 

You will learn about random variables (numeric outcomes resulting from random processes), how to model data generation procedures as draws from an urn, and the Central Limit Theorem, which applies to large sample sizes.

Section 4: The Big Short 

You will learn how interest rates are determined and how some bad assumptions led to the financial crisis of 2007-2008.

FAQS

What is the deadline to sign up for a Verified Certificate?

The deadline is listed on the right side of the course landing page.

How do I earn a certificate?

To earn a certificate, you must sign up for a Verified Certificate by the deadline and earn a grade of at least 70%. When you achieve this score, a view your certificate button will appear on your dashboard. For more information, click on this link.

How do I upgrade to a verified certificate?

Go to your edX Dashboard (by clicking the edX icon at the top left of this page). Under this course, click the "Challenge Yourself!" link.

How long does the course take?

That is up to you! It is 4 weeks of content. Just be aware that you must complete the course by the deadline listed on your course homepage.

I am doing well on the assessments, but when I look under "Progress" I have a very low grade...why?

The grade is calculated based on all of the assessments you have completed and the assessments that you have not completed (edX says you have a "zero" on those assessments until you have attempted them). You will see your overall grade move up as you progress through the course.

Installing R

For this course, you do not need to install R because we're using DataCamp for all of the assignments.

Once you decide to install R, you can download it freely from the Comprehensive R Archive Network (CRAN). It is relatively straightforward, but if you need further help you can try the following resources:

Research

HarvardX pursues the science of learning. When you participate in this course, you will also participate in research about learning. Read our research statement to learn more.