Skip to main content

Syllabus

Statistical Analysis in Bioinformatics 

Course Description

Improvements in modern biology have led to a rapid increase in sensitivity and measurability in experiments and have reached the point where it is often impossible for a scientist alone to sort through the large volume of data that is collected from just one experiment.

For example, individual data points collected from one gene expression study can easily number in the hundreds of thousands. These types of data sets are often referred to as ‘biological big data’ and require bioinformaticians to use statistical tools to gain meaningful information from them.

In this course, part of the Bioinformatics MicroMasters program, you will learn about the R language and environment and how to use it to perform statistical analyses on biological big datasets.

Course Learning Outcomes

In this course, which is a part of the Bioinformatics MicroMasters program, you will learn the following:

  • Apply packages in the R environment to determine changes in gene expression
  • Apply packages in the R environment to locate genes in a full genomic sequence
  • Demonstrate the installation of R and R Studio
  • Discuss the functions of reading and writing data
  • Discuss FASTA, retrieving sequences from NCBI, word content determination and plotting
  • Illustrate pairwise sequence alignments and multiple sequence alignments
  • Demonstrate how to create dotplots from two sequences called dotplots
  • Describe genomic and predictive analysis along with emphasis on Mass spec analysis and LDA analysis
  • Discuss linkage analysis and Genome-Wide Association Studies (GWAS)
  • Illustrate Microarray analysis and differently expressed genes
  • Demonstrate methods of sequencing RNA and how to generate different plots under quality control

Course Information

This is a self-paced online course.  All course materials are presented in English.

Learners new to edX are recommended to take the DemoX course, which is designed to show new students how to take a course on edx.org.

Course Materials

All materials are freely available within the course. Additional references, resources, and optional readings can be easily accessed and downloaded from the References, Resources, and Optional Readings sections of this course. 

Course Schedule

This is  a self-paced course, as such, all content (e.g. weekly knowledge checks, discussions, assessments ) will be available starting in Week 1 and will remain available through the entire eight weeks of the course.

Students should plan to spend between 4-6 hours each week to fully complete each module.

Week

Topics

Optional Reading

Activities

1

  • Installing of R and R Studio
  • R and R Working Directories
  • Basic R: Reading and Writing Data Simple Plotting

  • Week 1 Discussion
  • Week 1 Knowledge Check

2

  • Working with DNA Sequences
  • Sequence Statistics
  • Protein Sequence Statistics
  • Week 2 Discussion
  • Week 2 Knowledge Check

3

  • Pairwise Sequence Alignment
  • Multiple Sequence Alignment
  • Phylogenetic Reconstruction

  • Week 3 Discussion
  • Week 3 Knowledge Check

4

  • Computational Gene Discovery
  • Plotting the Results
  • Comparative Genomics

  • Week 4 Discussion
  • Week 4 Knowledge Check
  • MidTerm Assessment

5

  • Mass Spec Analysis
  • LDA Analysis
  • Protein-protein Interactions

  • Week 5 Discussion
  • Week 5 Knowledge Check

6

  • Linkage Analysis - GWAS
  • eQTL Analysis

  • Week 6 Discussion
  • Week 6 Knowledge Check

7

  • Microarray Analysis
  • Differentially Expressed Genes
  • Visualization

  • Week 7 Discussion
  • Week 7 Knowledge Check

8

  • RNA-seq Data Processing
  • Week 8 Discussion
  • Week 8 Knowledge Check
  • Final Assessment

Course Grading and Policy

You will be able to view all material and take any knowledge checks, or participate in discussions at any time during the course. However, to be awarded a certificate you will need to have completed all the assessments, discussions and knowledge checks by the last scheduled date for the course.

The course score for BIF003x is determined from two components: a Mid Term Assessment (50%) and a Final Assessment (50%).

Mid-Term Assessment

At the end of the material for Week 4, there is a Mid-Term assessment, which accounts for 50% of your overall score in the course. The Mid-Term contains 25 randomly generated questions based on the topics covered from Week 1 to 4. These questions are multiple choice and are two points each.  Note: you will only have one attempt at this assessment.

Final Assessment

At the end of the material for Week 8, there is a Final assessment, which accounts for 50% of your overall score in the course. The Final assessment contains 25 randomly generated questions based on the topics covered from Week 5 to 8. These questions are multiple choice and are two points each. Note: you will only have one attempt at this assessment.

Certification

For those students working to obtain the MicroMasters certificate, you must register for the verified track before the deadline and obtain a total score of 80% or greater in the class. For those verified students who receive an edX certificate, it will appear on their edX dashboard after the course ends. As of December 7, 2015, edX no longer offers certificates for students who audit a course.