PH125.8x: Data Science: Machine Learning - Course Syllabus
Course Instructor
Rafael Irizarry
Course Description
In this 8th course of nine in the HarvardX Data Science Professional Certificate, we learn how to use R to build a movie recommendation system using the basics of machine learning, the science behind the most popular and successful data science techniques.
Perhaps the most popular data science methodologies come from machine learning. What distinguishes machine learning from other computer guided decision processes is that it builds prediction algorithms using data. Some of the most popular products that use machine learning include the handwriting readers implemented by the postal service, speech recognition, movie recommendation systems, and spam detectors.
In this course, part of our Professional Certificate Program in Data Science, you will learn popular machine learning algorithms, principal component analysis, and regularization by building a movie recommendation system.
You will learn about training data, a set of data used to discover potentially predictive relationships and how the data can come in the form of the outcome we want to predict and features that we will use to predict this outcome. As you build the movie recommendation system, you will learn how to train algorithms using training data so you can predict the outcome for future datasets. You will also learn about overtraining and techniques to avoid it such as cross-validation. All of these skills are fundamental to machine learning.
What You'll Learn
- The basics of machine learning
- How to perform cross-validation to avoid overtraining
- Several popular machine learning algorithms
- How to build a recommendation system
- What is regularization and why it is useful?
New to EdX?
Are you new to edX? Check out edx's Demo Course!
Need help? Visit edX Support via the Support tab or visit the Help Center.
Course Structure
This is a self-paced course; you can progress through the material at your own pace. Note that you may find the material in this course more challenging than the material in the previous courses, so you may want to give yourself more time.
Grading
This course contains comprehension checks that are worth 80% of your grade. There are also two comprehensive assessments, one open to all learners and one at the end of the course open to Verified learners only, that are worth 20% of your grade.
Certification
In order to receive a Verified Certificate, you must sign up and pay for a Verified Certificate by the deadline on the course page and earn a passing grade of at least 70%.
Research
HarvardX pursues the science of learning. When you participate in this course, you will also participate in research about learning. Read our research statement External link to learn more.
COURSE OUTLINE
Section 1: Introduction to Machine Learning
In this section, you'll be introduced to some of the terminology and concepts you'll need going forward.
Section 2: Machine Learning Basics
In this section, you'll learn how to start building a machine learning algorithm using training and test data sets and the importance of conditional probabilities for machine learning.
Section 3: Smoothing and Linear Regression for Prediction
In this section, you'll learn why linear regression is a useful baseline approach but is often insufficiently flexible for more complex analyses and how to smooth noisy data.
Section 4: Cross-validation and kNN
In this section, you'll learn about the k-nearest neighbors algorithm and how to perform cross-validation.
Section 5: The Caret Package
In this section, you'll learn how to use the caret package to implement many different machine learning algorithms.
Section 6: Model Fitting and Recommendation Systems
In this section, you'll learn how to apply the machine learning algorithms you have learned.
FAQs - ABOUT THIS COURSE
How long does the course take?
That is up to you! It is 6 weeks of content. Just be aware of the course close date on the right side of the course landing page.
Do I need to have a background in statistics or R to take the course?
This is the eighth course in the series. We assume you have either taken the first seven courses or are familiar with the content taught there. If you are new to R and/or statistics, we strongly recommend starting with the first course, PH125.1x: R Basics.
I do not have a background in programming. Can I still take the course?
While we don't assume any background in programming at the beginning of the course series, by this eighth course, we do expect you to have learned the programming skills taught in the first seven courses. If you have no background in programming at all, we highly recommend starting with the first course, PH125.1x: R Basics.
Is this class challenging?
These courses are taught at the college level. Some of the material, depending on your exposure, may be fairly challenging. Machine learning in particular can be a challenging topic, but there is substantial help from the community including a lively discussion board.
Is there a textbook?
Yes, there is a free PDF textbook available here in English External link and here in Spanish External link. (Note: The book is "free" in that you can slide the "YOU PAY" scale to $0. You are welcome to pay what you can afford, and there is no advantage in the course to anyone that "purchases" the book for more money.)
There is also an HTML version of the textbook here External link.
How long do I have to complete the course?
In principle, if you spend 2-4 hours per week, you should be able to complete the course within around six to eight weeks. That said, this course is self-paced, so you can take as long as you want, provided you complete the course before the deadline listed on your course homepage.
Do I have to take the courses in sequence?
The courses in the HarvardX Data Science Professional Certificate are designed to be taken in the following order:
Each subsequent course assumes familiarity with the content in the preceding courses. Depending on your experience with data science generally and R specifically, you may be able to take the courses out of sequence if you choose.
PROFESSIONAL CERTIFICATE FAQs
How often will the courses be offered?
Courses in the program are offered frequently, with overlap - so if now isn’t a good time for you to start one of the courses you need as a prerequisite or if you missed a deadline, there will be another offering of the course you need coming soon!
Does the order of courses in the Professional Certificate Program matter?
Yes, order does matter, particularly for the first four courses in the sequence. For the later courses, depending on your previous experience, you may be able to swap the sequence of some of the courses. The courses are designed to be taken in the following order:
Do I need to register for all of the courses at once in order to be eligible for the Professional Certificate?
No! You can take courses individually - once you have obtained an ID Verified Certificate in each course, you will be eligible for the Professional Certificate. If you choose to pre-pay for the entire program, you receive a discount on the total registration cost.
OTHER COMMONLY ASKED QUESTIONS
Please find a list of HarvardX's most commonly asked questions below. You will need to scroll to see the whole list.
If you can't see the question list above, click this link to open the FAQ in a new window External link.