TP 4: Identifying differentially expressed genes with limma
In this TP, you will get some practice using the BioConductor package
limma.
It implements the mod t and B statistics,
so that you can rank genes for differential expression.
As usual, you should always make sure you read the
help
documentation for each function you do not already know.
The limma
User's Guide
is extremely useful, you will probably want to refer to it often (not just
today, but throughout the rest of the course, including the exam).
The latest version can be found at
http://bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf.
You will be reading several sections of this today.
Start off by reading the brief Introduction beginning on p.5
and also Sections 8.1 and 8.2.
You will be analyzing the Affy e. coli and estrogren experiments
referred to in the Introduction.
Sections 9.1, 9.2 and 9.5 are useful for parameterization and corresponding
design matrix for 2 condition (e. coli) and factorial (estrogen) experiments.
You should
You might want to skim through the chapter on Statistics for
Differential Expression (Chapter 13, p.60).
Over the next few weeks, this material should start to make more sense.
The function lmFit
fits a linear model to each gene separately.
Following that with
eBayes
will get the mod t and B statistics.
Make sure that you look at the structure of
the object you create with these (called
fit in the user guide).
To get all the names of components of fit, you can type
names(fit).
The B-stat is contained in the
lods
component.
Do not worry just yet about what 'fdr' (false discovery rate) means,
we will learn more about this on Friday when we cover multiple hypothesis testing.
e. Coli data
Here you will work through Example 17.1 (p. 98 of the user guide).
We do not have the cel files, but there is a bioConductor package
that contains these data as an AffyBatch.
Begin by starting R, then install and load the package ecoliLeucine as well as limma.
Also compute RMA values, so that you have you data matrix that will be
analyzed for DE genes:
source("https://bioconductor.org/biocLite.R")
biocLite("ecoliLeucine")
library(ecoliLeucine)
library(limma)
data(ecoliLeucine)
eset <- rma(ecoliLeucine)
pData(eset)
Now you can continue the example from the top of p. 99
estrogen
Follow Example 17.2 for practice in analyzing a factorial experiment.
Any necessary packages that you have not already installed can be
found from the bioConductor website.
The analysis should follow straightforwardly from the example.
Well if you have made it this far you have done a lot of work!
Do not worry about writing a lab report this time, but
print out a table
of the top 50 most DE genes and bring it to class next week.