Big Data in Education: Student Guide
Description
Course Structure
Grading Policy
Schedule
Discussion Etiquette
FAQ
Description
The emerging research communities in educational data mining and learning analytics are developing methods for mining and modeling the increasing amounts of fine-grained data becoming available about learners. In this class, you will learn about these methods, and their strengths and weaknesses for different applications. You will learn how to use each method to answer education research questions and to drive intervention and improvement in educational software and systems. Methods will be covered both at a theoretical level, and in terms of how to apply and execute them using standard software tools. Issues of validity and generalizability will also be covered, towards learning to establish how trustworthy and applicable the results of an analysis are.
Course Structure
The course consists of the following:
- Videos. The lectures are delivered via videos, available on edX, which are between four and fifteen minutes long. There will be approximately one hour of video content per week but this will vary. You can watch these videos whenever you choose.
- Readings. Suggestions for additional readings will be included in the syllabus. These are optional. Unfortunately, no assistance can be given to students in obtaining readings; I recommend trying Google Scholar, googling the web page of the article authors, and contacting your library.
- In-Video Quizzes. Most of the videos will contain in-video quizzes, which test your understanding. These quizzes do not count towards your grade.
- Discussion Forums. I encourage using the discussion forums to ask questions and learn from your professor, TA, and fellow students. Forum participation does not count towards your grade. (However, do not post answers to assignment questions or RapidMiner scripts for the assignments on the forums; you can however discuss the concepts and how to do the assignments).
- Assignments. There will be an assignment each week.
- Emailing Your Professor. I encourage you to ask your questions on the discussion forum prior to emailing your professor. The professor and TA will be reading the discussion forums, as will other members of the Baker EDM Lab. With a class of over 30,000 students, the professor may not be able to directly respond to all emails. Your chance of getting a useful response will be much higher on the discussion forums.
Grading Policy
- Assignments – There will be an assignment each week. In each assignment, you will conduct analysis on a data set provided to you, and answer questions about it. You should complete this assignment within two weeks of its release. You'll have three attempts at the quiz, and the best score out of the three attempts will be counted. Your final grade will then be an average of the 6 best weekly scores. The assignments account for 100% of your course grade, and are weighted equally . Quiz answers will be released after the quiz closes.
- To receive a certificate of completion, students will need to earn an overall grade average of 70% or above.To receive a certificate of completion with distinction, students will need to earn an overall grade average of 85% or above.
Schedule
Schedule and Topics Covered
Week 1: Prediction
- Introduction to EDM
- Regressors
- Classifiers
- Classifying in RapidMiner 5.3
- Case study in Prediction Modeling
Readings
- Baker, R., Siemens, G. (in press) Educational data mining and learning analytics. To appear in Sawyer, K. (Ed.) Cambridge Handbook of the Learning Sciences: 2nd Edition.
- Siemens, G. (2013). Learning analytics: The emergence of a discipline. American Behavioral Scientist, 57 (10), 1380-1400.
- Ferguson, R. (2012).Learning analytics: drivers, developments and challenges. International Journal of Technology Enhanced Learning (IJTEL),4(5/6),304-317.
- Witten, I.H., Frank, E. (2011) Data Mining: Practical Machine Learning Tools and Techniques. Ch. 4.6, 6.1, 6.2, 6.4
- Witten, I.H., Frank, E. (2011) Data Mining: Practical Machine Learning Tools and Techniques. Sections 4.6, 6.5.
Week 2: Diagnostic Metrics and Cross-Validation
- Detector Confidence
- Diagnostic Metrics: Kappa, Accuracy, ROC, A’, Correlation, RMSE
- Cross-Validation
- Over-Fitting
- Model/Detector Validity
Readings
- Russell, S., Norvig, P. (2010) Artificial Intelligence: A Modern Approach. Ch. 20: Learning Probabilistic Models.
- http://en.wikipedia.org/wiki/Receiver_operating_characteristic
- http://en.wikipedia.org/wiki/Precision_and_recall
- http://en.wikipedia.org/wiki/Cohen's_kappa
- http://en.wikipedia.org/wiki/Cross-validation_(statistics)
Week 3: Feature Engineering and Behavior Detection
- Ground Truth for Behavior Detection
- Data Synchronization and grain-sizes
- Automated Feature Generation
- Automated Feature Selection
- Knowledge Engineering and Data Mining
- Case study in Behavior Detection
Readings
- D'Mello, S. K., Picard, R. W., and Graesser, A. C. (2007) Towards an Affect-Sensitive AutoTutor. Special issue on Intelligent Educational Systems – IEEE Intelligent Systems, 22(4), 53-61.
- Sao Pedro, M., Baker, R.S.J.d., Gobert, J. (2012) Improving Construct Validity Yields Better Models of Systematic Inquiry, Even with Less Information. Proceedings of the 20th International Conference on User Modeling, Adaptation and Personalization (UMAP 2012), 249-260.
- Liu, H., Yu, L. (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17 (4), 491-502.
Week 4: Knowledge Inference and Knowledge Structures
- Knowledge Inference
- Bayesian Knowledge Tracing
- Performance Factors Analysis
- Item Response Theory
- Q-Matrices
- Knowledge Spaces
- Advanced Knowledge Structure Inference
Readings
- Corbett, A.T., Anderson, J.R. (1995) Knowledge Tracing: Modeling the Acquisition of Procedural Knowledge. User Modeling and User-Adapted Interaction, 4, 253-278.
- Pavlik, P.I., Cen, H., Koedinger, K.R. (2009) Performance Factors Analysis -- A New Alternative to Knowledge Tracing. Proceedings of the International Conference on Artificial Intelligence and Education.
- Baker, Frank B. (2001) The Basics of Item Response Theory. Chapters 1,2.
- Barnes, T. (2005) The Q-matrix Method: Mining Student Response Data for Knowledge. Proceedings of the Workshop on Educational Data Mining at the Annual Meeting of the American Association for Artificial Intelligence.
- Desmarais, M.C., Meshkinfam, P., Gagnon, M. (2006) Learned Student Models with Item to Item Knowledge Structures. User Modeling and User-Adapted Interaction, 16, 5, 403-434.
Week 5: Relationship Mining
- Relationship Mining
- Correlation Mining
- Causal Mining
- Association Rule Mining
- Sequential Pattern Mining
- Network Analysis
Readings
- Arroyo, I., Woolf, B. (2005) Inferring learning and attitudes from a Bayesian Network of log file data. Proceedings of the 12th International Conference on Artificial Intelligence in Education, 33-40.
- Rau, M. A., & Scheines, R. (2012) Searching for Variables and Models to Investigate Mediators of Learning from Multiple Representations. Proceedings of the 5th International Conference on Educational Data Mining, 110-117.
- Witten, I.H., Frank, E. (2011) Data Mining: Practical Machine Learning Tools and Techniques. Ch. 4.5
- Merceron, A., Yacef, K. (2008) Interestingness Measures for Association Rules in Educational Data. Proceedings of the 1st International Conference on Educational Data Mining, 57-66.
- Srikant, R., Agrawal, R. (1996) Mining Sequential Patterns: Generalizations and Performance Improvements. Research Report: IBM Research Division. San Jose, CA: IBM.
- Perera, D., Kay, J., Koprinska, I., Yacef, K., Zaiane, O. (2009) Clustering and Sequential Pattern Mining of Online Collaborative Learning Data. IEEE Transactions on Knowledge and Data Engineering, 21, 759-772.
- Haythornthwaite, C. (2001) Exploring Multiplexity: Social Network Structures in a Computer-Supported Distance Learning Class. The Information Society: An International Journal, 17 (3), 211-226
Week 6 Visualization
- Visualization of Educational Data
- Learning Curves
- Moment-by-Moment Learning Graphs
- Learnograms
- Heat Maps
- Scatterplots
- Parameter Space Maps
- Flow Diagrams
Readings
- Corbett, A.T., Anderson, J.R. (1995) Knowledge Tracing: Modeling the Acquisition of Procedural Knowledge. User Modeling and User-Adapted Interaction, 4, 253-278.
- Baker, R.S.J.d., Hershkovitz, A., Rossi, L.M., Goldstein, A.B., Gowda, S.M. (in press) Predicting Robust Learning With the Visual Form of the Moment-by-Moment Learning Curve. To appear in the Journal of the Learning Sciences.
- Hershkovitz, A., Nachmias, R. (2008) Developing a Log-Based Motivation Measuring Tool. Proceedings of the First International Conference on Educational Data Mining, 226--233
- Pardos, Z. A., Heffernan, N. T. 2010) Navigating the parameter space of Bayesian
- Knowledge Tracing models: Visualizations of the convergence of the Expectation Maximization algorithm. Proceedings of the 3rd International Conference on Educational Data Mining.
Week 7: Clustering and Factor Analysis
- Basic Clustering Algorithms
- Advanced Clustering Algorithms
- Validation of Clustering
- Case Study in Clustering
- Factor Analysis
Readings
- Witten, I.H., Frank, E. (2011) Data Mining: Practical Machine Learning Tools and Techniques. Ch. 4.8, 6.6
- Amershi, S. Conati, C. (2009) Combining Unsupervised and Supervised Classification to Build User Models for Exploratory Learning Environments. Journal of Educational Data Mining, 1 (1), 18-71.
- Alpaydin, E. (2004) Introduction to Machine Learning. pp. 116-120.
Week 8: Discovery with Models
- Discovery with Models
- Uses of Discovery with Models
- Risks of Discovery with Models
- Data Mining Data Mining
- Future Directions in EDM
Readings
- Hershkovitz, A., Baker, R.S.J.d., Gobert, J., Wixon, M., Sao Pedro, M. (2013) Discovery with Models: A Case Study on Carelessness in Computer-based Science Inquiry. American Behavioral Scientist, 57 (10), 1479-1498.
- Aleven, V., Mclaren, B., Roll, I., & Koedinger, K. (2006). Toward meta-cognitive tutoring: A model of help seeking with a Cognitive Tutor. International Journal of Artificial Intelligence in Education, 16(2), 101-128.
- Kinnebrew, J. S., Biswas, G., & Sulcer, B. (2010). Modeling and measuring selfregulated learning in teachable agent environments. Journal of e-Learning and Knowledge Society, 7(2), 19-35
Discussion Etiquette
One of the most powerful and dynamic components of this edX course will be your contributions! It’s your course, so openly discuss the material. Help each other!
Here are some things to consider to help make the discussion forums as engaging and productive as possible:
- Tone - Tone is a very important part of online communication. Before posting, read your message out loud. Ask yourself if you would say this to a fellow student in your class in a face-to-face discussion.
- Peer support - Make an effort to understand and support your peers. People have different perspectives - but everyone is here to learn! And the more we learn from each other, the better!
- Disagree vs. attack - Disagreeing with peers in debate and discussion is fine and welcome, but make sure to avoid challenges that may be interpreted as personal.
- Check previous postings - Take a minute to read read previous posts to ensure that the conversation you want to have is not happening elsewhere in he board.
- Delete the extraneous - When replying to another's post, be specific about the sentence, phrase, or comment that you are addressing. This will help to keep the thread focused, and it will make it easier for all of us to understand how the conversation is progressing.
- Be open to challenges and confrontations
- Encourage others to share their ideas
Here are four approaches to consider when engaging in our weekly discussions:
- Agree/Disagree - It is perfectly fine to agree or disagree with others in the discussions, but explain the "level" of your agreement or disagreement. Avoid posting short responses such as "Yes! I agree!", or "No! That is wrong!" Explain WHY you agre or disagree.
- Critique - Thoughtful and constructive criticism of each other's posts will help to keep the discussions positive, academic, and interesting.
- Expand - If you find a post interesting or thought provoking, use your reply to expand upon it.
- Exemplify - Bring in examples to support your ideas and comments.
FAQ
Here's a great "Student FAQ" from edX