Aperçu des semaines

    • To contact me (24/24):
      email: darlene.goldstein@epfl.ch
      tel/sms/whatsapp/signal: 079 427 2501
      skype: darlenegoldstein

      Course format: 

      Although we are currently allowed to attend the course on campus, ALL LECTURES WILL ALSO BE PRERECORDED, since some of you may be unable to attend in person from time to time. Please follow the lecture before the 'office hours' or lab time so that you can ask any additional questions then.

      Office hours: I will be available for your questions each week Thursdays and Fridays 12.00-13.00 in my office MA B1 477 and also by appointment.

      The lab time is Tuesday 16.00-18.00 in CO3. I understand that there will be some class conflicts, so I will try to find a new room for those of you who are unable to attend at the assigned time. For the first week, in any case, please try to attend either my office hours or the lab time, and we will figure out a solution to any conflict.

      Course language: 

      This course is given in English, but feel free to speak in either English or French.

      Resources:
      A useful book for both statistics and R:

      • A Handbook of Statistical Analyses Using R, 3rd edition. Torsten Hothorn and Brian S. Everitt. CRC Press.
      Some resources to get you started with R, R Studio and R Markdown:
    • Repository of R packages, you can download R from here. Also a good source of documentation (see 'Contributed' under the Documentation heading).
    • RStudio makes R easier to use. It includes a code editor, debugging & visualization tools. Choose the free desktop version corresponding to your computer and operating system.
    • Tutorials and examples for reproducible research using R:
    • Forum for students to find group members. Once you have formed a group, please send me 1 email containing the names of all group members. As a reminder, your group can contain 1-4 persons.
  • 20 février - 26 février

    Week 1: Course organization, reproducible research, hypothesis testing review

    Organization: you will write a short group report (~ 5 pages; a 'group' can be 1-4 persons), a short group article critique (1 page, it can be in question/answer format), and a longer individual report (up to ~ 7 pages). The 2 reports will be about data analyses you carry out. The group data set will be assigned to you. For the individual report, you can choose a topic from a list that I will provide once we have covered all the eligible topics in lecture. I will announce when you can email me your choice, so please do not send me an email earlier than that. Once you email me your choice, I will assign you a data set on that topic.

    The purpose of this course is to help you to learn something without too much stress!! That is why you can do each of the 2 reports twice: a preliminary version, which will be commented according to posted criteria, then a final version, where you can incorporate the comments, due at the end of the semester. Only the final version will count towards your course note. The deadlines will be posted on the course moodle page. 

    For the article critique, you will get the 1/2 point (full credit) as long as you submit it by the deadline - you don't need to do a preliminary version.

    In order to give you time to work on your reports, there will be no in-person lectures and mainly optional topics toward the end of the course. These 'extra' topics are NOT required, there are slides (and possibly videos) in case you are interested. There is no penalty associated with not following them.

    NOTE: this first week's LECTURE is ONLINE ONLY, there will be NO in-class lecture. Please come to the lab meeting in CO3 on Tuesday afternoon for the EDA presentation and to get started with R and RStudio.


    Grading

    • 1/2 point: short report 1 (either regression or anova), can be in a group of up to 4 people
    • 1/2 point: short critique on a scientific article (will be assigned to you), can be in a group of up to 4 people
    • 5 points: individual analysis report (your choice among a number of topics)

  • 27 février - 5 mars

    Week 2: Linear regression modeling

    You can already email me your groups (1 email per group); remember, each group can contain 1-4 persons. Each group will be assigned to analyze EITHER a regression data set OR an anova data set.

  • 6 mars - 12 mars

    Week 3: Experimental design, Analysis of variance (anova); 
    (OPTIONAL but recommended: General Linear Model, Model selection - online only)

    Report 1: (initial/preliminary deadline 18.00 on Tuesday 18 April)

    The purpose of this assignment is to give you practice writing a scientific report. Report writing is an extremely important skill, regardless of whether you continue in an academic career, in government or in industry.

    You should analyze your data in an appropriate manner (either like lab week 2 for regression or lab week 3 for anova) and write a short report, ~ 5 pages (7 pages max). Please submit your report (as a .pdf file, NOT .DOC) in the moodle assignment space, 1 per group. The spaces will be labeled R1, R2, A1, A2, for regression problems 1/2 and anova problems 1/2. Your file name should be labeled as R1-##.pdf, etc., where ## is your group number and R1 (etc.) is your assigned problem.

    Your report should contain a short background/intro to the problem (including the aim of the original study), a presentation of the results of your statistical analyses, including exploratory data analysis, model fitting and final model, along with a short discussion of any shortcomings of the final model, and your conclusions. Include relevant graphics and tables, but DO NOT include any raw R code or output (you will be penalized for this if you do). Your graphs should be 'pretty', if you copy/paste a graph from the screen, it will most likely appear to be blurry (png file) and you will be penalized for this. It is easiest to include nice-looking graphs if you save a pdf version and use R Markdown, but this is not the only way.

    Your report will also be graded based on language use and overall presentation. (It can be in either English or French.) Please use 12 point size and margins of 2.5 cm. Remember to number each page at the bottom (including page 1)Do not include a cover page, and  do not exceed 5 pages or you will be penalized. Inside the top margin of each page, please include the surnames of each group member (separated by commas).

    As a reminder, this report counts for 1 point (out of 6) of your course note. The initial deadline (18.00 on 18 April) is for your preliminary report. (The final version is due by 18 June any time). If you turn in your report before the initial deadline then we will be able to comment on your report and you can re-do it before the final deadline.

    When you email me with the names of your group members I will send you the dataset (after Lab 3).

    UPDATE: Regression 1 (airline costs)

    Some of you have had difficulty reading in the data set for the airline costs data set. Here is what you can do to fix that:

    1. Go to the data web site
    2. Copy / paste into a (plain) text file
    3. Edit the text file to remove the space between words in 2 word airline names (ie, AllAmerican, LakeCentral, WestCoast)
    4. Assuming that your text file is called 'air.txt', then in R, type

    air <- read.delim("air.txt", header=FALSE, sep="")

    Then you should have an R object called 'air' and you can use this for the analysis. It might be helpful to rename the column (variable) names, since the default ones are uninformative (V1, V2, ...). I believe that there should be 13 columns in your data frame 'air'.

    Please let me know if you have further trouble.
  • 13 mars - 19 mars

    Week 4: Generalized linear modeling, logistic regression, Poisson regression
  • 20 mars - 26 mars

    Week 5: Survival analysis

    Second assignment  This assignment is a statistical critique of a published paper. Your report can either be written as a full review or in a question/answer format by just simply by responding to each question. Your report should not be more than 1 page.

    You can turn in this report any time before the final deadline - 18 June 2023. You will get full credit (i.e. 1/2 point toward your course note) for turning in a reasonable effort.

    Groups who worked on regression problems:
    L1: http://www.jcancer.org/v09p1421.htm

    Groups who worked on anova problems:
    L2: https://www.sciencedirect.com/science/article/pii/S1743919118307337

    A guide sheet (study assessment questions) is uploaded to help you to address statistical issues.

    The file contains a longer list of questions to consider when evaluating a study. As a guide for your 2nd assignment report, please make sure that you respond particularly to the following: (numbers in parentheses represent points out of 6)

    (1) 1. Briefly give the biomedical background for the paper. What question/hypothesis is being investigated?

    (1) 2. What data are collected (include how many individuals, what variables)?

    (1) 3. What analyses were carried out? Are these analyses appropriate for the problem?

    (1) 4. What other analyses should have been done (or might have been done but not shown)? Explain.

    (1) 5. Is there any mention of power of the analyses? How would you go about trying to estimate power?

    (1) 6. What conclusions do the authors draw? Are these conclusions substantiated by the results? Explain.

  • 27 mars - 2 avril

    Week 6: Discrete data analysis, contingency tables, 2x2 tables; data visualization; asymptotic and exact tests
  • 3 April - 9 April

    Week 7: Genetic association studies, genome-wide association studies (GWAS); principal components analysis, multiple hypothesis testing
  • 3 April

    Choose your individual project topic from one of the following:

    • survival analysis
    • logistic regression
    • generalized linear model (other than logistic, e.g. Poisson)
    • discrete data / contingency table analysis
    • genome-wide association study (GWAS)

    and EMAIL ME your choice. I will then send you a dataset for analysis (or you can start working on the GWAS tutorial if you are doing a GWAS).

    Your final report should be ~5-7 pages (absolute maximum, fewer pages is better if you can be concise).

    The preliminary deadline is Saturday 13 May (any time), then we will give you feedback in 1-2 weeks. You should then have a few more weeks to work on it before the final deadline of 18 June (any time).


    NOTE: As a reminder, you MUST work on this individual analysis and report ALONE. Your analysis and report should represent YOUR OWN WORK. DO NOT COMMUNICATE WITH ANYONE in ANY WAY about this project. If you have ANY question or problem, please ask ONLY ME and NOT anyone else.

    I will consider ANY violation of this policy as PLAGIARISM (PLAGIAT) and will report any suspicion of plagiarism/plagiat to the Vice-présidence académique – Affaires juridiques. I have reported previous students who have been sanctioned for violating this rule, so please DO NOT TEST ME ON THIS.

    If you have ANY questions, please don't hesitate to ask ME and ONLY ME. Do not risk your course note or your EPFL career by asking or communicating with any student.

  • 10 April - 16 April - PÂQUES

    NO CLASS OR LAB - PÂQUES

  • 17 April - 23 April

    Week 8: (OPTIONAL)

    NO CLASS OR LAB - time to work on reports; if you have any questions, please visit me during office hours or make an appointment with me.

  • 18 April 18.00

    Please deposit only 1 report per group.

    • Please deposit your first group assignment here if you did Regression problem 1, as a pdf file named R1-## , where ## is you group name. The preliminary due date is 18 April 18.00.

    • Please deposit your first group assignment here if you did Regression problem 2, as a pdf file named R2-## , where ## is you group name. The preliminary due date is 18 April 18.00.

    • Please deposit your first group assignment here if you did Anova problem 1, as a pdf file named A1-## , where ## is you group name. The preliminary due date is 18 April 18.00.

    • Please deposit your first group assignment here if you did Anova problem 2, as a pdf file named A2-## , where ## is you group name. The preliminary due date is 18 April 18.00.

    • Please deposit any late prelim reports (R1 / R2 / A1 / A2) here.
  • 24 April - 30 April

    Week 9: Clinical trials (OPTIONAL)

    NO CLASS OR LAB - time to work on reports; if you have any questions, please visit me during office hours or make an appointment with me.

    • Tutorial instructions:

      You can start reading at page 5, and do exercises 9-12 but ONLY for the t-test (not the tests listed).

      Next, read the section about power curves, then make a graph of power curves like the one in the tutorial, but with deltas varying from 0.1-0.9 by 0.1. Do exercise 13.

      Work through the section on Cox regression and do exercise 14.

      If you have time and interest, you can work through the section on Power Simulation. You can also do exercise 16 if you want (not required).

  • 1 May - 7 May

    Week 10: Meta-analysis (OPTIONAL)

    NO CLASS OR LAB - time to work on reports; if you have any questions, please visit me during office hours or make an appointment with me.

  • 8 May - 14 May

    Week 11: Introduction to mixed-effects models (OPTIONAL)

    NO CLASS OR LAB - time to work on reports; if you have any questions, please visit me during office hours or make an appointment with me.

  • 13 May

    Please name your preliminary report as follows: lastname-topic-prelim.pdf (for example, if I were doing survival analysis, my report would be named goldstein-survival-prelim.pdf) (open from 11 April-13 May).

    THESE SLOTS ARE NOW CLOSED - if you want to submit a late preliminary report, please SCROLL DOWN to find the assignment deposit / submission slots.

    • Devoir icon
      survival-prelim Devoir
      Disponible jusqu'au 23 mai 2023, 23:55
    • Devoir icon
      logistic-prelim Devoir
      Disponible jusqu'au 23 mai 2023, 23:55
    • Devoir icon
      GLM-prelim Devoir
      Disponible jusqu'au 23 mai 2023, 23:55
    • Devoir icon
      discrete-prelim Devoir
      Disponible jusqu'au 23 mai 2023, 23:55
    • Devoir icon
      GWAS-prelim Devoir
      Disponible jusqu'au 23 mai 2023, 23:55
  • 15 May - 21 May

  • 23 May - 29 May

    Week 13: (OPTIONAL)

    NO CLASS OR LAB - time to work on reports; if you have any questions, please visit me during office hours or make an appointment with me.

  • 29 May - 15 July (29 May férié)

    Week 14Monday 29 May - NO CLASS  (férié); NO LAB THIS WEEK.

    For the final individual reports:

    Please name your final report as follows: name-topic-final.pdf (for example, if I were doing survival analysis, my report would be named goldstein-survival-final.pdf).  Many of you did not follow this file-naming convention on your preliminary reports, which made some things more difficult for me.

    Thank you very much.

    Comments on first (group) report, 2 pages per group:

  • More LATE preliminary reports (any)

    • If you have not yet turned in a preliminary report:

      either group or individual, please submit it here so that I can try to give you some feedback and you can try to improve it before the FINAL deadline (15 July at 23.59).

      I will email your comments directly to you as soon as possible, within 2 days I hope.
  • Final GROUP reports - Deposit slots

    Please deposit all final GROUP reports here.

    • Due 15 July 23.59, but I will accept your report until 16 July 23.59 at the VERY LATESTI cannot accept any report later than this for any reason.

    • Due 15 July 23.59, but I will accept your report until 16 July 23.59 at the VERY LATESTI cannot accept any report later than this for any reason.

    • If your group worked on a regression problem.

      Due 15 July 23.59, but I will accept your report until 16 July 23.59 at the VERY LATEST. I cannot accept any report later than this for any reason.

    • Due 15 July 23.59, but I will accept your report until 16 July 23.59 at the VERY LATESTI cannot accept any report later than this for any reason.

    • Due 15 July 23.59, but I will accept your report until 16 July 23.59 at the VERY LATESTI cannot accept any report later than this for any reason.

    • If your group worked on an anova problem.

      Due 15 July 23.59, but I will accept your report until 16 July 23.59 at the VERY LATEST. I cannot accept any report later than this for any reason.

  • Final INDIVIDUAL reports - Deposit slots

    Please deposit all final INDIVIDUAL reports here.

    NOTE: you do NOT have to make a submission if you don't need to re-do your report.