Applied biostatistics
Weekly outline
-
-
To contact me (24/24):email: darlene.goldstein@epfl.chtel/sms/whatsapp: 079 427 2501skype: darlenegoldstein
The PRELIMINARY deadline for all reports is 30 May (any time).
NEW: The FINAL deadline for submitting ALL of your corrected work is 10 July (any time; earlier is better!!).
A useful book for both statistics and R:- A Handbook of Statistical Analyses Using R, 3rd edition. Torsten Hothorn and Brian S. Everitt. CRC Press.
Some resources to get you started with R, R Studio and R Markdown: -
RStudio makes R easier to use. It includes a code editor, debugging & visualization tools. Choose the free desktop version corresponding to your computer and operating system.
-
Tutorials and examples using R
-
Repository of R packages, you can download R from here. Also a good source of documentation (see 'Contributed' under the Documentation heading).
-
-
Week 1: Course organization, reproducible research, hypothesis testing review
The PRELIMINARY deadline for all reports is 30 May (any time).
NEW: The FINAL deadline for submitting ALL of your corrected work is 10 July (any time; earlier is better!!).
Grading1 point2 points: short report 1, can be in a group of up to 4 people1/2 point: short critique on a scientific article, can be in a group of up to 4 people- 4 points: individual project report
1/2 point: individual meta-analysis or power short report
-
Practice with R - first download R and RStudio, then work through the exercises (you can skip the part at the end about writing a report).
Note that the hyperlink 'here' at the bottom of the page under 'Simulating microarray data' should connect to lausanne.isb-sib.ch (not isrec.isb-sib.ch).
-
Week 2: Linear regression modeling
The PRELIMINARY deadline for all reports is 30 May (any time).
NEW: The FINAL deadline for submitting ALL of your corrected work is 10 July (any time; earlier is better!!).
-
Week 3: Experimental design and analysis of variance (ANOVA)
The PRELIMINARY deadline for all reports is 30 May (any time).
NEW: The FINAL deadline for submitting ALL of your corrected work is 10 July (any time; earlier is better!!).
Report 1:
The purpose of this assignment is to give you practice writing a scientific report. Report writing is an extremely important skill, regardless of whether you continue in an academic career, in government or in industry.
You should analyze your data in an appropriate manner (either like lab week 2 for regression or lab week 3 for anova) and write a short report, 3-5 pages (max). Please submit your report (as a .pdf file) by email. I will comment on and return reports in the order that I receive them.
Your report should contain a short background/intro to the problem (including the aim of the original study), a presentation of the results of your statistical analyses, including exploratory data analysis, model fitting and final model, along with a short discussion of any shortcomings of the final model, and your conclusions. Include relevant graphics and tables, but DO NOT include any raw R code or output (you will be penalized for this if you do). Your graphs should be 'pretty', if you copy/paste a graph from the screen, it will most likely appear to be blurry and you will be penalized for this. It is easiest to include nice-looking graphs if you use R Markdown, but this is not the only way.
Your report will also be graded based on language use and overall presentation. (It can be in either English or French.) Please use 12 point font and margins of 2.5 cm. Remember to number each page at the bottom (including page 1). Do not include a cover page, and do not exceed 5 pages or you will be penalized. Inside the top margin of each page, please include the surnames of each group member (separated by commas).
As a reminder, this report counts for
12 points (out of 6) of your course note.The initial deadline (12.00 noon on 20 March) is for your preliminary report. (The final version is due by 12.00 noon 20 June).If you turn in your report before the initial deadline then I will be able to comment on your report.-
Here are the group and data set assignments (UPDATED Saturday 7 March 15h15). For each data set there is also an explanation to go along with it, including which columns contain which variables and the outcome variable. There is also a literature reference that you should be able to access from EPFL / vpn.epfl.edu. You can use that as an aid to guide you in your analyses, but you can also do additional or different analyses if you want.
Also, you can use the literature paper as a guide to how you might write your report. You should include a short intro / background, including a clear statement of the problem of interest; a complete exploratory data analysis (EDA); a description of your model fitting and selection analysis; a description of your model assessment and justification / results of that; your final chosen model written in mathematical terms; inclusion of relevant plots (they should be 'pretty'); any conclusions adressing the problem of interest. You will also be evaluated on the quality of language and the overall presentation of your report.
If you return your report before the preliminary deadline, I will be able to give you commentary on how to improve your report that you can incorporate into your final submission.
-
-
Week 4: Experimental design and analysis of variance (ANOVA)
PRELIMINARY DEADLINE 1 (Regression problems R1 and R2): NEXT week, 20 March by noonThe PRELIMINARY deadline for all reports is 30 May (any time).
NEW: The FINAL deadline for submitting ALL of your corrected work is 10 July (any time; earlier is better!!)
-
Week 5: General linear model and Model selection
PRELIMINARY DEADLINE 1: regression group (preliminary) report, due by 12.00 noon Friday 20 March(anova group report preliminary deadline due by 12.00 noon Friday 27 March)The PRELIMINARY deadline for all reports is 30 May (any time).
NEW: The FINAL deadline for submitting ALL of your corrected work is 10 July (any time; earlier is better!!).
-
Carry out this R tutorial with an example environmental dataset. (It is ok to skip the part about partial correlation analysis - 7.1.2.).
-
Week 6: Generalized linear modeling, logistic regression, Poisson regression
The PRELIMINARY deadline for all reports is 30 May (any time).
NEW: The FINAL deadline for submitting ALL of your corrected work is 10 July (any time; earlier is better!!).
-
Week 7: Survival analysis
The PRELIMINARY deadline for all reports is 30 May (any time).
NEW: The FINAL deadline for submitting ALL of your corrected work is 10 July (any time; earlier is better!!).
-
Week 8: Discrete data analysis, contingency tables, 2x2 tables; data visualization; asymptotic and exact tests
Extra lecture this week, so that you can choose individual topic project before vacation week.
Genetic association studies, genome-wide association studies (GWAS); principal components analysis, multiple hypothesis testing
The PRELIMINARY deadline for all reports is 30 May (any time).
NEW: The FINAL deadline for submitting ALL of your corrected work is 10 July (any time; earlier is better!!).
Choose your individual project topic from one of the following (email me and I will provide you with a dataset)- survival analysis
- logistic regression
- generalized linear model (other than logistic)
- discrete data/contingency table analysis
- genome-wide association study (GWAS)
Your final report should be ~5-10 pages (absolute maximum, fewer pages is better if you can be concise). If your report is longer than 10 pages (not including references), you will be penalized.I will comment on the projects in the order in which I receive them and get it back to you. You should then have a few more weeks to work on it.
-
(same as Statistical Genetics Lecture 4a)
-
(same as Statistical Genetics Lecture 4b)
-
Work on manipulating tables and carrying out tests (sections 2.1-2.5, 3.1-3.5 only). Before starting, you will need to load the vcd and vcdExtra packages using the R function library().
NOTE: The web address for the article by Richard Darlington (section 3.5) is:
http://node101.psych.cornell.edu/Darlington/crosstab/TABLE0.HTM
Explore making mosaic plots
## Example R code for Arthritis mosaic plot:
data("Arthritis", package = "vcd")
(art <- xtabs(~ Treatment + Improved, data = Arthritis, subset = Sex == "Female")) ## females only
set.seed(1071)
library(vcd)
mosaic(art, gp = shading_max, gp_args = list(n = 5000), split_vertical=TRUE)
## OR: mosaicplot(art) -
For more informtion about mosaic plots in the vcd package, see the 2 vignettes:
Residual-Based Shadings in vcd
The Strucplot Framework: Visualizing Multi-way Contingency Tables with vcd
-
Easter vacation - NO CLASS OR LAB
The PRELIMINARY deadline for all reports is 30 May (any time).
NEW: The FINAL deadline for submitting ALL of your corrected work is 10 July (any time; earlier is better!!).
NOTE: You are supposed to work on this analysis and report ALONE. Please don't hesitate to ask me if you have any questions.
-
This package has stepwise selection functions for linear, generalized linear and Cox models. Should be helpful for those of you with GLM (Poisson), logistic (which is also a GLM) or survival data. You can download this package from the CRAN:
https://cran.r-project.org/web/packages/My.stepwise/index.html
-
Section 5 of this paper contains a short description of how the valung data were collected (you can ignore other parts of the paper).
-
This folder contains functions from the genABEL package (no longer available) that will help with your GWAS analysis. For instructions, data and code, please follow the tutorial available at:
http://stat-gen.org/tut/tut_intro.html
In the genome-wide association analysis section of the tutorial, you will get either a warning or an error about the genABEL package. Instead of loading genABEL, ignore the error/warning and source the functions ztransform, rntransform, estlambda and GWAA.R (assuming those .R files are in your R working directory):
# Phenotype data preparation
# library(GenABEL)
source("ztransform.R")
source("rntransform.R")
source("estlambda.R")
source("GWAA.R")NEW: If you are using R 4.0.x on a mac, you may get an error when you execute the GWAA function. If that happens, you should downgrade R to 3.6.0 (for example) and see if that works. If you still have problems, please contact me and we will try to work it out.
You are supposed to work on this analysis and report ALONE. Please don't hesitate to ask me if you have any questions.
UPDATE: You do NOT have to do the very last part (Regional Association)
-
Week 9: Clinical trials
The PRELIMINARY deadline for all reports is 30 May (any time).
NEW: The FINAL deadline for submitting ALL of your corrected work is 10 July (any time; earlier is better!!).
-
Tutorial instructions:
You can start reading at page 5, and do exercises 9-12 but ONLY for the t-test (not the tests listed).
Next, read the section about power curves, then make a graph of power curves like the one in the tutorial, but with deltas varying from 0.1-0.9 by 0.1. Do exercise 13.
Work through the section on Cox regression and do exercise 14.
If you have time and interest, you can work through the section on Power Simulation. You can also do exercise 16 if you want (not required). -
Might also be of interest
-
Week 10: Meta-analysis
1 May: PRELIMINARY DEADLINE 1 for group reportThe PRELIMINARY deadline for all reports is 30 May (any time).
NEW: The FINAL deadline for submitting ALL of your corrected work is 10 July (any time; earlier is better!!).
-
Introduction to mixed-effects models
The PRELIMINARY deadline for all reports is 30 May (any time).
NEW: The FINAL deadline for submitting ALL of your corrected work is 10 July (any time; earlier is better!!).
-
NO CLASS OR LAB: time to work on final report
The PRELIMINARY deadline for all reports is 30 May (any time).
NEW: The FINAL deadline for submitting ALL of your corrected work is 10 July (any time; earlier is better!!).
-
NO CLASS OR LAB: time to work on final report
The PRELIMINARY deadline for all reports is 30 May (any time).
NEW: The FINAL deadline for submitting ALL of your corrected work is 10 July (any time; earlier is better!!).