https://mitxpro.mit.edu/
Syllabus for Data Science and Big Data Analytics
OUTLINE
Course Description
Access the full course description on the Course About Page.
Time Requirement / Commitment
This course is accessible online 24/7; most of the course is self-paced. Lectures are pre-taped, and you can follow along at your convenience as long as you submit the two compulsory peer-reviewed case studies and graded activities before the due dates. You may complete all assignments at your own pace. However, you may find it more beneficial to adhere to the suggested weekly schedule so you can stay up-to-date with the discussion forums.
There are approximately 2 hours of video every week. Most participants will spend between 3 and 4 hours a week on course-related activities. However, when you do the optional case study activities, the time required varies depending on your experience and programming background. We suggest planning somewhere between 1 and 3 hours per case study. For Modules 4 and 6's compulsory case studies, we suggest you plan between 1 and 3 hours to solve the case study and 30 minutes to review 2 activities from your peers.
Please note that for assessment due-dates, the edX platform uses Coordinated Universal Time (UTC). To convert times to your local time zone, please use this time converter tool.
« Back to Top
Who Should Participate?
Prerequisite(s): This course is designed for data scientists and data analysts, as well as professionals who wish to turn large volumes of data into actionable insights. Because of the broad nature of the information, the course is well suited for both early career professionals and senior managers, including:
- Technical managers
- Business intelligence analysts
- Management consultants
- IT practitioners
- Business managers
- Data science managers
- Data science enthusiasts
« Back to Top
Pedagogy
Learning Objectives
After taking this course, participants will:
- Accelerate learning from research to industry dissemination and expose participants to the latest techniques and how to use them;
- Understand common pitfalls in big data analytics and how to avoid them;
- Develop a better understanding of machine learning and how it works in practice;
- Learn how to interpret model results and what questions you should be asking before you use the results to make business decisions;
- Identify the challenges and constraints associated with scaling big data algorithms.
Methodology
Course materials blend the following pedagogical strategies to best achieve the learning objectives of the course and individual modules.
- Instructivism: Teacher-centered learning where the instructors present relevant content (tutorial videos enhanced with animation and graphics). Participants test their knowledge through graded tests.
- Constructivism: Learning-by-doing approach. We encourage participants to construct their own understanding through solving the mandatory and optional case studies and learning through the practice activities.
- Social constructivism: Learning through social interactions and communication. You will discuss and interact with your peers in the discussion forum and evaluate and get reviews from your peers through two compulsory case studies.
- Connectivism: Connecting with others and extending your knowledge through communication. You will be able to expand on and share your knowledge with others through the discussion forum, Facebook, and an exclusive LinkedIn group.
Learning Activities Planned for the Program
- Optional participation in threaded discussions on designated forums
- Graded activities assessments
- 2 graded case studies
- Non-graded practice activities and case studies
- Video learning sequences
- Resources tab
- Optional knowledge sharing and networking with participants through LinkedIn, Facebook, and/or or direct contact.
« Back to Top
Course Staff
« Back to Top
Course Requirements
Participants must complete a mandatory entrance survey to gain access to the videos and other course materials. You will be able to access the survey on the course start date: Monday, Oct 23, 2017 - 04:00 UTC.
To get the most out of this course, you are encouraged to watch all course videos, complete all weekly assessments, and actively participate in the discussion forums.
Grading:
Grades are not awarded for this program; the course is pass/fail. To earn an MIT Professional Certificate, you must achieve an overall completion of 70% of the required activities. This information will be the "Total" column on the course progress screen. MIT xPRO will not track your video progress, but please note that your understanding of all course content is necessary to complete the course's graded activities and case studies.
Participants who successfully complete all course requirements earn an MIT Certificate and receive 1.8 Continuing Education Units (1.8 CEUs).
« Back to Top
Course Schedule
Download the Course Schedule
This course is structured into an 8-week program (7 days of content and 1 holiday week). Although most of the course is self-paced and asynchronous, the calendar below suggests a weekly schedule for the purpose of staying up-to-date with the discussion forums and submitting timely the two peer-review case studies and graded activities on time.
Please note that no extensions will be granted, and all required assessments and assignments must be completed and submitted on or before December 17, 2017, at 23:59 UTC.
Entrance Survey: Participants are required to provide some information via a short course entrance survey. Your answers will help the course team and faculty better understand your goals for taking this course and how familiar you are with Data Science concepts, and they will ultimately be a guide to improving your experience and that of future courses. You will be able to access the survey on the course start date, October 23, 2017, 04.00 UTC. As soon as you complete the survey, you will be granted access to the videos and may start the course.
Module MenuModule 1 | Module 2 | Module 3.1 | Module 3.2 | Module 4 | Module 5 | Module 6
MODULE |
CONTENT |
|
WEEK 1 Module 1: Making Sense of Unstructured Data
Dates October 23 - October 29
Faculty Leads Stefanie Jegelka & Tamara Broderick
|
Compulsory non-graded Entrance Survey (complete this survey in order to view the course content).
Introduction
- What is unsupervised learning, and why is it challenging?
- Examples of unsupervised learning
Clustering (Tamara Broderick)
- What is clustering?
- When to use clustering
- K-means preliminaries
- The K-means algorithm
- How to evaluate clustering
- Beyond K-means: What really makes a cluster?
- Beyond K-means: Other notions of distance
- Beyond K-means: Data and pre-processing
- Beyond K-means: Big data and nonparametric Bayes
- Beyond clustering
Case Studies:
- Case Study 1: Genetic Codes
- Case Study 2: Finding themes in Project Description
Spectral Clustering, Components, and Embeddings (Stefanie Jegelka)
- What if we do not have features to describe the data or not all are meaningful?
- Finding the principal components in data and applications
- The magic of eigenvectors I
- Clustering in graphs and networks
- Features from graphs: The magic of eigenvectors II
- Spectral clustering
- Modularity Clustering
- Embeddings: New features and their meaning
Case Studies:
- Case Study 3: PCA: Identifying Faces
- Case Study 4: Spectral Clustering: Grouping News Stories
- ★ Complete Module 1.1 graded activity assessment (DUE December 17 - 23.59 UTC)
Recommended Weekly Activities
- Watch the course videos for this week
- Solve practice activities
- Try out optional case study activities
- Review and contribute to the Discussion Forum
Graded activities
- ★ Complete Module 1.2 graded activity assessment (DUE December 17 - 23.59 UTC)
← Back to Module Menu |
WEEK 2 Module 2: Regression and Prediction
Dates October 30 - November 5
Faculty Leads Victor Chernuzkov
|
Classical Linear and Nonlinear Regression and Extensions
- Linear regression with one and several variables
- Linear regression for prediction
- Linear regression for causal inference
- Logistic and other types of nonlinear regression
Case Studies:
- Case Study 1: Predicting Wages 1
- Case Study 2: Gender Wage Gap
Modern Regression with High-Dimensional Data
- Making good predictions with high-dimensional data; avoiding overfitting by validation and cross-validation
- Regularization by Lasso, Ridge, and their modifications
- Regression Trees, Random Forest, Boosted Trees
Case Study
- Case Study 3: Do poor countries grow faster than rich countries?
The Use of Modern Regression for Causal Inference
- Randomized Control Trials
- Observational Studies with Confounding
Case Studies
- Case Study 4: Predicting Wages 2
- Case Study 5: The Effect of Gun Ownership on Homicide Rates
Recommended Weekly Activities
- Watch the course videos for this week
- Solve practice activities
- Try out optional case study activities
- Review and contribute to the Discussion Forum
Graded activities
- ★ Complete Module 2 graded activity assessment (DUE December 17 - 23.59 UTC)
← Back to Module Menu |
WEEK 3 Module 3.1 Classification and Hypothesis Testing
Dates November 6 November 12
Faculty Leads David Gamarnik & Johnathan Kelner
|
Hypothesis Testing and Classification
- What are anomalies? What is fraud? Spams?
- Binary Classification: False Positive/Negative, Precision / Recall, F1-Score
- Logistic and Probit regression: Statistical binary classification
- Hypothesis testing: Ratio Test and Neyman-Pearson
- p-values: Confidence
- Support vector machine: Non-statistical classifier
- Perceptron: Simple classifier with elegant interpretation
Case Study
- Case-study 1: Logistic Regression: The Challenger Disaster
Recommended Weekly Activities
- Watch the course videos for this week
- Solve practice activities
- Try out optional case study activities
- Review and contribute to the Discussion Forum
Graded activities
- ★ Complete Module 3.1 graded activity assessment (DUE December 17 - 23.59 UTC)
← Back to Module Menu |
WEEK 4 Module 3.2 Deep Learning
Dates November 13 November 19
Faculty Leads Ankur Moitra
|
Deep Learning
- What is image classification? Introduce ImageNet and show examples
- Classification using a single linear threshold (perceptron)
- Hierarchical representations
- Fitting parameters using back-propagation
- Non-convex functions
- How interpret-able are its features?
- Manipulating deep nets (ostrich example)
- Transfer learning
- Other applications I: Speech recognition
- Other applications II: Natural language processing
Case Study
- Case Study 2: Decision boundary of a deep neural network
Recommended Weekly Activities
- Watch the course videos for this week
- Solve practice activities
- Try out optional case study activities
- Review and contribute to the Discussion Forum
Graded activities
- ★ Complete Module 3.2 graded activity assessment (DUE December 17 - 23.59 UTC)
← Back to Module Menu |
November 20 - 26 |
HOLIDAY WEEK |
|
WEEK 5 Module 4 Recommendation Systems
Dates November 27 December 3
Faculty Lead Devavrat Shadh & Phillipe Rigollet
|
Recommendations and Ranking
- What does a recommendation system do?
- So what is the recommendation prediction problem? And what data do we have?
- Using population averages
- Using population comparisons and ranking
Collaborative Filtering
- Personalization using collaborative filtering using similar users
- Personalization using collaborative filtering using similar items
- Personalization using collaborative filtering using similar users and items
Personalized Recommendations
- Personalization using comparisons, rankings, and users-items
- Hidden Markov Model / Neural Nets, Bipartite graph, and graphical model
- Using side-information
- 20 questions and active learning
- Building a system: Algorithmic and system challenges
Case Studies
- Case Study 1: Recommending Movies
- Case Study 2: Recommend New Songs to the Users based on their listening habits.
- Case Study 3: Make New Product Recommendations
Wrap-up
- Guidelines on building system
- Parting remarks and challenges
Recommended Weekly Activities
- Watch the course videos for this week
- Solve practice activities
- Try out optional case study activities
- Review and contribute to the Discussion Forum
Graded activities
- ★ Complete Module 4 graded activity assessment (DUE December 17 - 23.59 UTC)
- ★ Graded Case Study
- Solve and Submit your Case Study (DUE December 1 - 23.59 UTC)
- Review and submit the work of your peer (DUE December 3 - 23.59 UTC)
← Back to Module Menu |
WEEK 6 Module 5: Networking and Graphical Models
Dates December 4 December 10
Faculty Lead Caroline Uhler & Guy Bresler
|
Introduction
- Introduction to networks
- Examples of networks
- Representation of networks
Networks
- Centrality measures: degree, eigenvector, and page-rank
- Closeness and betweenness centrality
- Degree distribution, clustering, and small world
- Network models: Erdos-Renyi, configuration model, preferential attachment
- Stochastic models on networks for spread of viruses or ideas
- Influence maximization
Graphical Models
- Undirected graphical models
- Ising and Gaussian models
- Learning graphical models from data
- Directed graphical models
- V-structures, “explaining away,” and learning directed graphical models
- Inference in graphical models: Marginals and message passing
- Hidden Markov Model (HMM)
- Kalman filter
Case Studies
- Case study 1: Navigation / GPS
- 1.1: Kalman Filtering: Tracking the 2D Position of an Object when moving with Constant Velocity
- 1.2: Kalman Filtering: Tracking the 3D Position of an Object falling due to gravity.
- Case study 2: Identifying New Genes that cause Autism
Recommended Weekly Ativities
- Watch the course videos for this week
- Solve practice activities
- Try out optional case study activities
- Review and contribute to the Discussion Forum
Graded activities
- ★ Complete Module 5 graded activity assessment (DUE December 17 - 23.59 UTC)
← Back to Module Menu
|
WEEK 7 MODULE 6: Predictive Analytics
Dates December 11 December 17
Faculty Lead Kalyan Veeramachaneni
|
Predictive Modeling for Temporal Data
Feature Engineering
- Introduction
- Feature Types
- Deep Feature Synthesis: Primitives and Algorithms
- Deep Feature Synthesis: Stacking
Case Studies
- Case Study 6.1: NYC Taxi
- Case Study Module 6.2: UK Retail Dataset
Recommended Weekly Activities
- Watch the course videos for this week
- Solve practice activities
- Try out optional case study activities
- Review and contribute to the Discussion Forum
Graded activities
- ★ Complete ALL graded activity assessment activity by December 17, 2017 - 23.59 UTC
- ★ Graded Case Study
- Solve and Submit your Case Study (DUE December 15 - 23.59 UTC)
- Review and submit the work of your peer (DUE December 17 - 23.59 UTC)
Required Activities
- Complete Exit Survey (DUE December 17 - 23.59 UTC)
- Retrieve your MIT Course Certificate and 1.8 CEUs (Download it from your Dashboard — released on December 18)
Course Access
- +90 days of course access (December 18, 2017 - March 18, 2018)
- Course content is archived (no longer access to the course after March 18, 2018)
← Back to Module Menu
|
« Back to Top
Thank you for your participation in
Data Science and Big Data Analytics: Making Data-Driven Decisions
