Statistics for genomic data analysis
Weekly outline
-
-
To contact me (24/24):email: darlene.goldstein@epfl.chtel/sms/whatsapp/signal: 079 427 2501skype: darlenegoldstein
Course format:
Although we are currently allowed to attend the course on campus, ALL LECTURES WILL ALSO BE PRERECORDED (from a different year) , since some of you may be unable to attend in person from time to time. Please follow the lecture before the office hours or lab time so that you can ask any additional questions then.
Office hours: I will be available for your questions each week Thursday 12.00-13.00 (after class) in my office MA B1 477 and also by appointment (in person or zoom).
Course language:This course is given in English, but feel free to speak in either English or French.
Organization: Your course note will be based on an individual report (up to AT MOST 10 pages). You will report on an analysis of genomic (microarray) data where there will be 2 tasks: to identify genes that are differentially expressed between 2 conditions and to carry out a cluster analysis to identify (potentially novel) subgrouops.
The purpose of this course is to help you to learn something without too much stress!! That is why you can do the report twice: a preliminary version, which will be commented according to the posted criteria, then a final version, where you can incorporate the comments, due at the end of the semester. Only the final version will count towards your course note. The deadlines will be posted on the course moodle page.
In order to give you time to work on your reports, there will be no in-person lectures and mainly optional topics toward the end of the course. These 'extra' topics are NOT required, there are slides (and possibly videos) in case you are interested. There is no penalty associated with not following them.
Resources:
A useful book for both statistics and R:- A Handbook of Statistical Analyses Using R, 3rd edition. Torsten Hothorn and Brian S. Everitt. CRC Press.
Some resources to get you started with R, R Studio and R Markdown: -
Repository of R packages, you can download R from here. Also a good source of documentation (see 'Contributed' under the Documentation heading).
-
RStudio makes R easier to use. It includes a code editor, debugging & visualization tools. Choose the free desktop version corresponding to your computer and operating system.
-
Tutorials and examples for reproducible research using R:
-
Accessible from your EPFL account
-
-
Week 1: Molecular biology and technology background
-
Week 2: Quantifying expression for Affy chips (RMA); IDE: Identifying differentially expressed (DE) genes
-
The link for video 2a should work now, please let me know if you have any problems.
-
You do not need to read all of this!! But Chapter 1 might be helpful for better understanding of the molecular biological background and biotechnological aspects of Affymetrix GeneChips and experimentsl. To get a more detailed explanation of the RMA background adjustment, see pages 16-21.
-
This corresponds to chapter 3 of the BioConductor Case Studies book.
-
Week 3: Quality assessment for Affy chips; robust regression and affyPLM
-
Co-authored by your fearless leader (me!!)
-
Week 4: Experimental design; linear modeling
-
Venables + Ripley, MASS ch. 6 (especially 6.2, 6.7) (MASS = Modern Applied Statistics with S)
-
Week 5: Hypothesis testing review; multiple testing; permutation test
-
Venables + Ripley, MASS ch. 4
-
If you are having problems setting margins in a latex document, have a look at this - it shows all of the layout parameters on a page
-
Week 6: Cluster analysis
-
To get comments on this practice exam, please deposit your draft by 30 April (any time).
-
Week 7: Classification (optional) - VIDEO TO COME
NOTE: NO LECTURE OR LAB THIS WEEK: time to work on TP 7 (practice exam)
NO OFFICE HOURS THIS WEEK - if you have questions, please email me-
Optional classification activity
-
Optional classification activities
-
Easter holiday - NO LECTURE OR LAB THIS WEEK; NO OFFICE HOURS THIS WEEK
-
Week 8: Annotation; Gene set testing (optional) - VIDEO TO COME
NOTE: NO LECTURE OR LAB THIS WEEK: time to work on TP 7 (practice exam)
NO OFFICE HOURS THIS WEEK - if you have questions, please email me -
Week 9: Introduction to sequencing data, RNA-seq; generalized linear models (GLMs) (optional) (ADDITIONAL RESOURCES TO COME)
NOTE: NO LECTURE OR LAB THIS WEEK: time to work on TP 7 (practice exam)
NO OFFICE HOURS THIS WEEK - if you have questions, please email me-
Please deposit your practice exam here.
-
Please deposit your late TP 7 report here so that I can comment it.
-
Week 10: Sequence data; DE for RNA-seq data (optional) (ADDITIONAL RESOURCES TO COME)
NOTE: NO LECTURE OR LAB THIS WEEK: time to work on TP 7 (practice exam)
NO OFFICE HOURS THIS WEEK - if you have questions, please email me -
Week 11: Genetic association studies/GWAS (optional)
Ascension: NO LECTURE OR LAB THIS WEEK; NO OFFICE HOURS THIS WEEK-
Same as Applied Biostatistics 8a and Statistical Genetics 4a
NOTE: Please IGNORE the part at the beginning saying that you need to do a GWAS project, that was for a different course. Your project IS NOT A GWAS. -
Same as Applied Biostatistics 8b.
-
Same as Applied Biostatistics 8c.
-
[NOTE : the part at the end about multiple testing is REVIEW ; we saw this already in Lecture 5b]
-
-
Week 12: Miscellaneous topics (optional)
NOTE: NO LECTURE OR LAB THIS WEEK: time to work on exam
NO OFFICE HOURS THIS WEEK - if you have questions, please email me -
Week 13: NOTE: NO LECTURE OR LAB THIS WEEK: time to work on exam
OFFICE HOURS THIS WEEK: Thursday ~11.00 - 13.00 (at least); Friday ~11.00 - 13.00 (latest - I teach a class at 13.15) -
Week 14: NOTE: NO LECTURE OR LAB THIS WEEK: time to work on exam
NO OFFICE HOURS THIS WEEK - if you have questions, please email me -
-
Please (try to) upload your final exam here by 15 July 23.59, at the VERY LATEST I can accept it until 16 July 23.59. After that, I cannot accept any exam for any reason.
Best regards,
Darlene
-