Skip to main content

Introduction to Data Science with Python

Course Overview

Using Python, learners will study regression models and classification models, utilizing standard libraries such as sklearn, Pandas, matplotlib, and numPy. The course will cover key concepts of machine learning such as: picking the right complexity, preventing overfitting, regularization, assessing uncertainty, weighing trade-offs, and model evaluation. Participation in this course will build your confidence in using Python, preparing you for more advanced study in Machine Learning (ML) and Artificial Intelligence (AI) and advancement in your career.

Course Outline

  1. Linear Regression
  2. Multiple and Polynomial Regression
  3. Model Selection and Cross-Validation
  4. Bias, Variance, and Hyperparameters
  5. Classification and Logistic Regression
  6. Multi-logstic Regression and Missingness
  7. Bootstrap, Confidence Intervals, and Hypothesis Testing
  8. Capstone Project

Textbook

This course uses the textbook An Introduction to Statistical Learning, by Gareth James, Daniela Witten, Trevor Hastie, and Rob Tibshirani. You can find a free PDF copy of the second edition at statlearning.com. The file is about 21 MB. We recommend you download it now and keep it on your computer for easy reference.

Prerequisites

This course is an introduction to data science, but that doesn't mean it's a beginner-level course. You will need to know some calculus, a semester worth of college statistics, and a semester worth of programming (preferably in Python) in order to do well in this course.

Grading & Certification

This course primarily uses code-based assignments. They are hosted on the Ed platform (which is a separate company from edX), but you will see them right in this course. Different students may see different versions of certain questions, or see them in different locations. These are worth 90% of the grade.

Other problem types (such as multiple-choice questions within the edX platform) are used to see whether you've read the week's materials before starting on your assignments. You will be able to see the answers after you have used up your attempts, or when you get the answer correct. These are worth 10% of your grade.

Each subsection of the course is worth the same number of points. For instance, each pre-reading assignment is worth the same amount, regardless of whether it's three questions or just one.

Passing the course, and certification

The passing grade for this course is 60%.

If you register for a Verified Certificate, and your score is 60% or above, you will receive a certificate in electronic form. They are not mailed to you. Instead, you can generate them on your Progress page. The final day to sign up for a verified certificate is ten days before the close of the course.

Guidelines For Collaboration

We encourage class participants to collaborate on assignments! But be sure you learn how to do the assignments yourself, and please do not post full solutions to discussion forums. Staff will proactively remove any full solutions that are posted.

  • It is OK to discuss or work jointly to develop a general approach to an assignment.
  • It is OK to get a hint from peers or course staff if you get stuck on an assignment.
  • You should work out the details of assignments yourself.
  • It is not OK to copy someone else's solution, no matter where you get it.
  • It is not OK to take someone else's formula and plug in your own numbers to get the answer.
  • It is not OK to post answers to a problem.
  • It is not OK to look at a full step-by-step solution for the purpose of submitting an answer.

Discussion Forums

We encourage you to use the course Discussion Forum! It has many uses, and we'll prompt you to participate throughout the course.

Some good uses of the Discussion Forum:

  • Asking questions about course content and assignments.
  • Collaborating appropriately on assignments.
  • Contacting course staff.
  • Starting discussions related to course content.
  • Commenting on course content, including giving the instructors feedback, disagreeing with us, or suggesting improvements.

Our discussion forum guidelines

  • Be polite and encouraging.
  • Work together and work independently.
  • Post hints rather than answers. If you're not sure where to draw the line, follow the collaboration guidelines.
  • You can and should discuss questions, consider possibilities, and ask for hints.
  • You should not request or give out answers, even answers that you know are wrong.
  • Use your vote. If you agree with what someone says, don't write a post. Just click the plus button!
  • Tag your posts. If there is an issue that absolutely needs staff attention, put the word [STAFF] in brackets in your subject line. Course staff will be in the forums every day, but it may take up to two days to get a response sometimes, especially around holidays.

Honor code statement

HarvardX requires individuals who enroll in its courses on edX to abide by the terms of the edX honor code. HarvardX will take appropriate corrective action in response to violations of the edX honor code, which may include dismissal from the HarvardX course; revocation of any certificates received for the HarvardX course; or other remedies as circumstances warrant. No refunds will be issued in the case of corrective action for such violations. Enrollees who are taking HarvardX courses as part of another program will also be governed by the academic policies of those programs.

Nondiscrimination/anti-harassment statement

Harvard University and HarvardX are committed to maintaining a safe and healthy educational and work environment in which no member of the community is excluded from participation in, denied the benefits of, or subjected to discrimination or harassment in our program. All members of the HarvardX community are expected to abide by Harvard policies on nondiscrimination, including sexual harassment, and the edX Terms of Service. If you have any questions or concerns, please contact harvardx@harvard.edu and/or report your experience through the edX contact form.

Research Statement

HarvardX pursues the science of learning. By registering as an online learner in an HX course, you will also participate in research about learning. Read our research statement to learn more.