FAQ
About the Course:
Q : What is this course and series about?
A : PH525.X is a series of courses that teaches statistical inference by analyzing data using the R language. There are seven courses total. The first, PH525.1x is an introduction to basic statistical concepts and R programming skills.
Q : What is this R language we will be using?
A : R is "a programming language and environment" that is used in many fields for statistical analysis. R is also completely free and open source.
Q : Do I need to have a background in statistics or R to take the course?
A : No, we do not assume any knowledge in either R or statistics for the first course. The statistics and programming aspects of the class increase in difficulty across the first three courses. By the third course we will be teaching advanced statistical concepts.
Q : I do not have a background in programming. Can I still take the course?
A : Yes, we do not assume a background in programming. You will learn programming skills by completing the exercises.
Q : Is this class challenging?
A : These courses are taught at the college level. Some of the material, depending on your exposure, may be fairly challenging. However, there is also substantial help from the community including a lively discussion board.
Q : Is there a textbook?
A : Yes, there is a free textbook available here:
https://leanpub.com/dataanalysisforthelifesciences
Note: That the book is “free” in that you can slide the “You pay” scale to $0.
A paperback version of the textbook Data Analysis for the Life Sciences available at crcpress and on amazon.
Q : How long do I have to complete the course?
A : The course is self-paced, so you can take as long as you want.
Q : Do I have to take the courses in sequence?
A : No, you can skip any course you want and take courses out of sequence as you see fit. However, please note that the latter courses get more sophisticated and assume an understanding of material that was discussed in previous courses.
Certificates:
Q : Do I get a certificate at the end of the course?
A : This course offers “verified certificates” as proof that you have successfully completed the course. Verified certificates require you to verify your identity using a webcam and a government-issued ID. Further information can be found here: https://www.edx.org/verified-certificate
Q : What score do I need to get in order to get a certificate?
A : For these courses you need a score of 70% or greater.
Q : Is there a fee for the certificate?
A : Yes, there is a fee for verified certificates. It is handled by edX. You can contact them directly with questions at
https://www.edx.org/contact-us
Q: Do I need to verify my identity for each course I take?
A: No, apparently once you verify yourself with edX your verification is good for one year. See:
http://edx-guide-for-students.readthedocs.org/en/latest/SFD_enrolling.html#verify-your-identity
Software Etc.
Q : How do I get started?
A : First, you will need to install R onto your machine.
https://cran.r-project.org/index.html
Hopefully you can install the latest version (3.2.5). If you are using an older version of R, you should upgrade. We will make use of packages such as "dplyr" which only work if you are running version 3.1.2 or more recent. So, to save headaches install version 3.2.0 or more recent. Depending on your machine you may have to resolve various dependencies, but in most instances it should be straightforward to install R.
Q : What do I do after installing R?
A : You can begin the course or optionally install RStudio.
https://www.rstudio.com/
RStudio is a graphical user interface for R. RStudio is NOT part of the R language nor is it required in order to complete the course. However, it does provide a nice interface and we recommend you download it (Professor Irizarry uses RStudio in the videos). Similar to R, it is free and open-source.
Q : I'm ready to start the course, but how do I start swirl?
A : First, you need to install swirl. In R type:
install.packages("swirl")
Afterwards, and each time you begin swirl, you will need to load the package using the library() function
library("swirl")
Assuming everything is installed properly, swirl will work and greet you with something like
| Hi! Type swirl() when you are ready to begin.
Q : Can I install swirl using RStudio?
A : Yes, you can install most if not all packages via RStudio. Click on the right of the screen on Tools > Install Packages. Then you can enter the name of the package you want. If the package is found, you should be able to direclty download it.
Q : Will we be using other packages in this course?
A : Yes, we will be using and downloading various packages throughout this course and the subsequence courses. However, installing all packages is straightforward. For example, if you want to install rafalib you just enter
install.packages("rafalib")
Q : I'm trying to install other packages such as rafalib and I get an error (similar to):
'lib = "C:/Program Files/R/R-3.2.2/library"' is not writable
A : This problem occurs when you don't have administrative privileges to overwrite the file location. We strongly recommend that you fix this by running RStudio as administrator. There are other possible fixes, but you should have administrator privileges on your machine.
Q : I'm having problems installing devtools. Can you help?
A : devtools can be hard to install which is why we are not using it in any of the exercises. You do not need it for this course.
Q : R can not find the downloader in library(downloader). What do I do?
A : Install downloader with
install.packages("downloader")
Q : I'm having problems installing a package, any ideas?
A : First thing to check is your spelling. Many errors are caused by spelling. Secondly, make sure you have administrative privileges. If you are having further errors you can let us know. For any technical questions, make sure you tell us what operating system you are using.
Q : My R/RStudio session does not recognize the "%>%" function, why?
A : After downloading dplyr you will need to type
library(dplyr)
when you start a new R session.
Q : I am getting an error such as
cannot open file 'femaleMiceWeights.csv': No such file or directory
A : Probably because the file is not in your working directory. The file needs to be in the folder or you need to change the working directory to the one containing the file.
Q : I am trying to open a file I downloaded but am getting errors such as:
Error in read.table(file = file, header = header, sep = sep, quote = quote, : duplicate 'row.names' are not allowed
A : You may have downloaded the html version from GitHub. Remember that your data needs to be downloaded as “raw”.
Q : Can you help me download data such as femaleMiceWeights.csv directly please?
A : Yes, yes I can. Go here:
https://github.com/genomicsclass/dagdata/blob/master/inst/extdata/femaleMiceWeights.csv
Now click on "Raw" (you'll see "Raw" followed by "Blame" and "History" on the right of the screen). After you click on "Raw" you can save the page as a csv simply by going to "file->Save Page As"
Note that there is not much of a difference between a csv file and a text file. You can always try renaming the text file to extension csv if it's not saving correctly.
Q : I am getting this error:
Error in select(., time) : unused argument (time)
A : If you load UsingR after loading dplyr, then dplyr's 'select' gets overwritten (masked) and can result in the aforementioned error. This happens with other packages as well so you should know that you can always use dplyr::select to assure R uses the desired 'select'. You can load the data in UsingR without loading the package itself. Example:
data(nym.2002, package="UsingR")
Q : I am getting some strange error message in R. Ideas?
A : If you can not figure out what is wrong, a good idea especially if you have been using R for a long time is to exit and restart. If you are still seeing it after exiting and restarting, let us know.
Q : I am having some problems with loops. Can you please help?
A : When programming in any language, one of the most important concepts are loops. Chances are many of you already know what a loop is. R has built in loops, including "for" and "while". Here is an excellent introduction:
http://blog.datacamp.com/tutorial-on-loops-in-r/
R also has other useful built-in functions such as "lapply" and "sapply". swirl exercise 10 will give you practice with the two aforementioned loops. You will need to use for loops in week 2 exercises. Please note that everyone has different backgrounds, so if you come across a concept you do not know/understand, you are welcome to do a bit of Internet sleuthing and we encourage you to play around with R and do as many swirl exercises as you want!
Q : My code is not working! Can I ask you about it?
A : This is important. Yes, you can ask us, but first let us go over how to ask questions. When you ask a question please make sure to state the week the question was asked and a description of the question. For example:
"Week 4 - Symmetry of Log Ratios #1"
If the question is not directly related to a problem then state the week. For example:
"Week 1: dplyr"
This not only helps us, the staff, identify your problem it also helps other students who may have similar questions. Also, you can show us your code if you have questions. However, make sure it is legible. Even simple code can be difficult and annoying to read if garbled. For example do NOT present your code as such:
sum <- 100 for(i in 1:50) sum <- sum + i sum
Instead please make it neat:
sum <- 100
for(i in 1:50)
sum <- sum + i
sum
To do this, you can insert your code, then highlight it and press Ctrl+K and it should be nice and legible (if not, please fix it). Finally, please make sure your questions actually are posted as questions. Toggle your post type to question when you are asking a question. For more information about how to use the discussion forum, go to: https://courses.edx.org/courses/course-v1:HarvardX+PH525.1x+3T-2015/courseware/dcf8031210054672a6bd2a63d6f9d9ac/4ccc08ab56ec42cbb3be0ef8d3d34d22/