DOWNLOAD R CODE FILE FOR THIS TUTORIAL
First DOWNLOAD SIMEM R CODE zip bundle containing the R code files loaded below.
Also note that, while the code below uses relative paths, you should edit these to point to where you’ve stored the code, annotation or data files.
### To install required packages, uncomment and run this
# source("http://www.bioconductor.org/biocLite.R")
# biocLite(c("Biobase", "preprocessCore", "genefilter"))
# install.packages(c("blme", "doParallel", "ggplot2", "locfit", "MASS", "plyr", "reshape"))
suppressPackageStartupMessages(library("Biobase"))
suppressPackageStartupMessages(library("blme"))
suppressPackageStartupMessages(library("doParallel"))
suppressPackageStartupMessages(library("genefilter"))
suppressPackageStartupMessages(library("ggplot2"))
suppressPackageStartupMessages(library("locfit"))
suppressPackageStartupMessages(library("MASS"))
suppressPackageStartupMessages(library("plyr"))
suppressPackageStartupMessages(library("preprocessCore"))
suppressPackageStartupMessages(library("reshape"))
source("../../R/data_format_lib.R")
source("../../R/model_lib.R")
source("../../R/simem_lib.R")
The screen data formatted as an ExpressionSet
, including measurement weights, can be downloaded HERE.
load("../../data/shrna/breast_screens_with_weights.eset")
breast_screens
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 77156 features, 630 samples
## element names: exprs, groupMeans, signalProbWeights, varWeights
## protocolData: none
## phenoData
## sampleNames: breast_184a1_t0_a breast_184a1_t0_b ...
## breast_zr75b_t2_c (630 total)
## varLabels: sample_id tissue ... replicate_group (7 total)
## varMetadata: labelDescription
## featureData
## featureNames: TRCN0000000001 TRCN0000000002 ... TRCN0000181114
## (77156 total)
## fvarLabels: trcn_id trps_id ... gene_name (5 total)
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:
The reagent annotation table we’re about to load, including the weight
column which will be used to exclude some reagents, can be downloaded HERE.
hp = read.delim("../../data/annotations/hairpin_annotations.txt", header=T, as.is=T, check.names=F)
hpWeights = hp[,c("trcn_id", "gene_id", "weight")]
The breast cell line subtypes table detailing the classes we want to compare for differential essentiality can be downloaded HERE.
subtypes = read.delim("../../data/annotations/cell_line_subtypes.txt", header=T, as.is=T, check.names=F)
status = subtypes[,c("cell_line", "subtype_neve")]
Once the subtypes table is loaded, we want to reduce the 4 subtype Neve classification into 2 classes: “her2” and “other”.
status$erbb2 = ifelse(status$subtype_neve == "her2", "her2", "other")
status[1:10,]
## cell_line subtype_neve erbb2
## 1 184a1 basalb other
## 2 184b5 basalb other
## 3 600mpe luminal other
## 4 au565 her2 her2
## 5 bt20 basala other
## 6 bt474 her2 her2
## 7 bt483 luminal other
## 8 bt549 basalb other
## 9 cal120 basalb other
## 10 cal148 luminal other
Based on the above data frame, two parameters must be specified in the simem()
function: covariate = "erbb2"
and annotationsPerCellLine = status
.
By default, the analysis uses the first variable by alphabetical order as the baseline. In this case, we want to determine whether a gene is a context-dependent essential in her2+ cell lines compared to all other subtypes as baseline, so we’ll specify the following parameter in the simem()
function: covariateFactorOrder = c("other", "her2")
.
For the purposes of this example, to reduce computation time, we’ll perform the analysis for known positives highlighted in the manuscript
signalingPathway = c(207, #AKT1
11140, #CDC37
2064, #ERBB2
55914, #ERBB2IP
2065, #ERBB3
2475, #MTOR
5290, #PIK3CA
6009, #RHEB
25803, #SPDEF
7022) #TFAP2C
If we want to perform a genome-wide analysis, simply omit the geneIds = signalingPathway
parameter. This analysis typically takes ~18 hours, but the computation type can be dramatically reduced by parallelizing the modeling process on multiple processor cores (when available). To use 3 processor cores, for example, specify the parallelNodes = 3
parameter. This option is available on systems with multiple processor cores, whether running Windows, Linux, or Mac OS X. This is done using the doParallel
R package.
If we specify an analysis using both precision and signal-noise measurement weights (inverseVarianceWeights = TRUE
and signalProbWeights = TRUE
parameters, respectively), we must ensure that we’ve added these weights to the ExpressionSet
beforehand (DETAILED HERE).
Combining all the above, we’re ready to perform the differential essentiality analysis.
results = simem(screens = breast_screens,
geneIds = signalingPathway,
covariate = "erbb2",
reagentWeights = hpWeights,
annotationsPerCellLine = status,
inverseVarianceWeights = TRUE,
signalProbWeights = TRUE,
analyzeReagents = TRUE,
covariateFactorOrder = c("other", "her2"),
parallelNodes = 1
)
For illustration purposes, we’ll rerun the above code using multiple (3) parallel processor cores, taking advantage of the capabilities provided by the doParallel
package.
results = simem(screens = breast_screens,
geneIds = signalingPathway,
covariate = "erbb2",
reagentWeights = hpWeights,
annotationsPerCellLine = status,
inverseVarianceWeights = TRUE,
signalProbWeights = TRUE,
analyzeReagents = TRUE,
covariateFactorOrder = c("other", "her2"),
parallelNodes = 3
)
## automatically exporting the following variables from the local environment:
## analysisType, analyzeReagents, annotationsPerCellLine, annotationsPerId, calculateRelativeDropout, cellLineWeights, covariate, covariateFactorOrder, endPoint, formulas, inverseVarianceWeights, modelEngine, pvalueType, reagentWeights, screens, signalProbWeights, variableMap
## explicitly exporting variables(s): addPrecisionWeights, addSignalProbWeights, applyFDR, assignWeightsToObservations, combineResults, extractData, fitAnova, fitIsogenicCovariateGene, fitIsogenicCovariateRg, fitModel, fitModelAndFormatOutput, fitPanelCovariateGene, fitPanelCovariateRg, fitSimplerRandomEffects, formatOutput, getDefaultVariableMap, getMeanSd, getMeanVar, getMinimalSummary, getModelFormulas, getModelPValues, getRelativeDropout, getRelativeDropoutRates, getReplicateMeanVar, getSignalProb, getVarianceWeights, loadMeanVar, pvaluesNorm, pvaluesT, resetFactorLevels, simem, simemAnalysis, simemChunk, simemIsogenic, standardizeColumnNames, summaryAnova, summaryFitEffectCoefs, summaryFitLogLik, tryCatchWarningsErrors, withinBetweenDF
## numValues: 3, numResults: 0, stopped: TRUE
## got results for task 1
## numValues: 3, numResults: 1, stopped: TRUE
## returning status FALSE
## got results for task 2
## numValues: 3, numResults: 2, stopped: TRUE
## first call to combine function
## evaluating call object to combine results:
## fun(result.1, result.2)
## returning status FALSE
## got results for task 3
## numValues: 3, numResults: 3, stopped: TRUE
## calling combine function
## evaluating call object to combine results:
## fun(accum, result.3)
## returning status TRUE
Most users will be interested in the summarized per-gene context-specific essentiality predictions
options(width=100)
results$gene
## gene_id symbol difference_her2 pvalue_her2 fdr_her2 warnings errors
## 1 207 AKT1 -0.2487231 1.833014e-05 3.055023e-05
## 2 2064 ERBB2 -0.9193892 6.631059e-19 1.657765e-18
## 3 11140 CDC37 -1.3959328 1.029777e-22 1.029777e-21
## 4 55914 ERBB2IP -0.4401642 1.763505e-04 1.959450e-04
## 5 2065 ERBB3 -0.7649524 4.448219e-20 1.482740e-19
## 6 2475 MTOR -0.3807791 5.496490e-04 5.496490e-04
## 7 5290 PIK3CA -0.2946384 2.627995e-05 3.754279e-05
## 8 6009 RHEB -0.2215384 1.622149e-04 1.959450e-04
## 9 7022 TFAP2C -0.6540656 1.522445e-16 3.044890e-16
## 10 25803 SPDEF -1.3666388 5.948035e-21 2.974018e-20
If you want to examine more parameters for each gene, including model fit statistics, std. errors and such, examine the “gene_detailed” data frame.
str(results$gene_detailed)
## 'data.frame': 10 obs. of 21 variables:
## $ gene_id : int 207 2064 11140 55914 2065 2475 5290 6009 7022 25803
## $ symbol : chr "AKT1" "ERBB2" "CDC37" "ERBB2IP" ...
## $ AIC : num 9120 3226 10087 6208 10715 ...
## $ logLik : num -4550 -1603 -5033 -3094 -5347 ...
## $ intercept : num 11.17 11.49 9.78 10.76 10.87 ...
## $ se_intercept : num 0.272 0.32 0.275 0.522 0.241 ...
## $ t_intercept : num 41.1 36 35.6 20.6 45.1 ...
## $ baseline_trend : num -0.475 -0.484 -1.16 -1.05 -0.716 ...
## $ se_baseline : num 0.122 0.303 0.234 0.463 0.234 ...
## $ t_baseline : num -3.88 -1.59 -4.97 -2.27 -3.06 ...
## $ difference_her2 : num -0.249 -0.919 -1.396 -0.44 -0.765 ...
## $ se_her2 : num 0.0581 0.1035 0.1423 0.1174 0.0834 ...
## $ t_her2 : num -4.28 -8.88 -9.81 -3.75 -9.18 ...
## $ pvalue_intercept: num 0.00 3.88e-283 6.06e-278 2.61e-94 0.00 ...
## $ pvalue_baseline : num 1.03e-04 1.11e-01 6.83e-07 2.33e-02 2.22e-03 ...
## $ pvalue_her2 : num 1.83e-05 6.63e-19 1.03e-22 1.76e-04 4.45e-20 ...
## $ fdr_intercept : num 0.00 5.55e-283 7.58e-278 2.61e-94 0.00 ...
## $ fdr_baseline : num 3.42e-04 1.11e-01 5.82e-06 2.91e-02 3.71e-03 ...
## $ fdr_her2 : num 3.06e-05 1.66e-18 1.03e-21 1.96e-04 1.48e-19 ...
## $ warnings : chr "" "" "" "" ...
## $ errors : chr "" "" "" "" ...
If the analyzeReagents = TRUE
parameter was specified, per-reagent context-specific essentiality predictions are also available in both summarized
options(width=100)
reagent = results$reagent
reagent[order(reagent$symbol, reagent$trcn_id), ]
## trcn_id gene_id symbol difference_her2 pvalue_her2 fdr_her2 warnings errors
## 1 TRCN0000010162 207 AKT1 -0.419761291 5.617597e-02 9.148658e-02
## 2 TRCN0000010163 207 AKT1 -0.327417257 2.099172e-02 3.739149e-02
## 3 TRCN0000010171 207 AKT1 0.125861923 2.602783e-01 3.090805e-01
## 4 TRCN0000010174 207 AKT1 -0.951879394 7.217605e-04 1.870016e-03
## 8 TRCN0000039793 207 AKT1 -0.212658086 5.246260e-02 8.795201e-02
## 9 TRCN0000039794 207 AKT1 -0.306170521 7.817994e-03 1.485419e-02
## 10 TRCN0000039796 207 AKT1 0.052942041 4.500835e-01 4.933607e-01
## 11 TRCN0000039797 207 AKT1 -0.408935096 1.448935e-01 1.862583e-01
## 17 TRCN0000116632 11140 CDC37 -0.338810826 1.136410e-01 1.579887e-01
## 18 TRCN0000116633 11140 CDC37 -1.741620517 2.143741e-11 4.073107e-10
## 19 TRCN0000116634 11140 CDC37 -1.342011697 2.127244e-08 1.732184e-07
## 20 TRCN0000116635 11140 CDC37 -2.138160064 2.549092e-06 1.210819e-05
## 21 TRCN0000116636 11140 CDC37 -1.633741295 4.395037e-06 1.927055e-05
## 5 TRCN0000010341 2064 ERBB2 -0.791620224 1.094697e-03 2.712944e-03
## 6 TRCN0000010342 2064 ERBB2 -3.063583735 1.639043e-20 9.342543e-19
## 7 TRCN0000010343 2064 ERBB2 -0.595747496 3.416488e-08 2.434248e-07
## 12 TRCN0000039881 2064 ERBB2 -0.089827463 2.075710e-01 2.517350e-01
## 13 TRCN0000059068 55914 ERBB2IP -0.148830578 1.414368e-01 1.862583e-01
## 14 TRCN0000059070 55914 ERBB2IP -0.836410166 5.959154e-02 9.435328e-02
## 15 TRCN0000059071 55914 ERBB2IP -1.146687334 2.066879e-08 1.732184e-07
## 16 TRCN0000059072 55914 ERBB2IP -0.007772918 9.744257e-01 9.744257e-01
## 22 TRCN0000000621 2065 ERBB3 -0.310389994 3.317891e-01 3.708231e-01
## 23 TRCN0000000622 2065 ERBB3 -1.334270423 2.036181e-03 4.835929e-03
## 24 TRCN0000000623 2065 ERBB3 -0.547303371 5.176516e-04 1.405054e-03
## 25 TRCN0000009835 2065 ERBB3 -0.477514801 6.307659e-09 7.190731e-08
## 27 TRCN0000010344 2065 ERBB3 -0.493352834 7.914763e-05 2.819634e-04
## 32 TRCN0000018327 2065 ERBB3 -0.969681073 4.214089e-07 2.183665e-06
## 47 TRCN0000040108 2065 ERBB3 -0.154748272 6.824567e-03 1.440742e-02
## 48 TRCN0000040109 2065 ERBB3 -1.743516108 3.257401e-04 9.772204e-04
## 49 TRCN0000040111 2065 ERBB3 -0.804393190 7.305351e-03 1.453363e-02
## 42 TRCN0000039783 2475 MTOR -0.837461694 7.394303e-03 1.453363e-02
## 43 TRCN0000039784 2475 MTOR -0.391953744 6.887123e-02 1.033068e-01
## 44 TRCN0000039785 2475 MTOR -0.418043167 3.031553e-01 3.455971e-01
## 45 TRCN0000039786 2475 MTOR -0.232521844 1.470461e-01 1.862583e-01
## 46 TRCN0000039787 2475 MTOR -0.051301319 6.478652e-01 6.838577e-01
## 28 TRCN0000010406 5290 PIK3CA -0.791327708 2.105246e-04 6.666612e-04
## 29 TRCN0000010407 5290 PIK3CA -0.554117305 1.080427e-05 4.105623e-05
## 37 TRCN0000039603 5290 PIK3CA -0.481590653 1.230855e-01 1.670446e-01
## 38 TRCN0000039604 5290 PIK3CA 0.353946393 1.674254e-01 2.074620e-01
## 39 TRCN0000039605 5290 PIK3CA 0.009020816 9.399385e-01 9.567231e-01
## 40 TRCN0000039606 5290 PIK3CA -0.212789156 2.968663e-01 3.453343e-01
## 41 TRCN0000039607 5290 PIK3CA -0.316462178 1.169692e-07 7.408051e-07
## 26 TRCN0000009865 6009 RHEB -0.377844048 7.584068e-02 1.080730e-01
## 30 TRCN0000010424 6009 RHEB -0.630127955 2.081122e-02 3.739149e-02
## 31 TRCN0000010425 6009 RHEB -0.390649978 4.517398e-02 7.802778e-02
## 33 TRCN0000039599 6009 RHEB -0.131953209 6.613566e-02 1.018847e-01
## 34 TRCN0000039600 6009 RHEB -0.020736415 7.991316e-01 8.281909e-01
## 35 TRCN0000039601 6009 RHEB -0.321263801 7.307593e-02 1.068033e-01
## 36 TRCN0000039602 6009 RHEB 0.063758034 5.867099e-01 6.309899e-01
## 50 TRCN0000016548 25803 SPDEF -1.190427459 2.096109e-12 5.973911e-11
## 51 TRCN0000016550 25803 SPDEF -1.111681068 7.229684e-06 2.943514e-05
## 52 TRCN0000016551 25803 SPDEF -1.776043253 1.356210e-07 7.730395e-07
## 53 TRCN0000019744 7022 TFAP2C -0.232969287 3.623088e-04 1.032580e-03
## 54 TRCN0000019745 7022 TFAP2C -0.501144404 5.371469e-03 1.177591e-02
## 55 TRCN0000019746 7022 TFAP2C -1.740871491 2.487981e-09 3.545373e-08
## 56 TRCN0000019747 7022 TFAP2C -0.518505932 1.091727e-04 3.660496e-04
## 57 TRCN0000019748 7022 TFAP2C -0.429511255 2.531160e-03 5.771044e-03
and detailed versions.
str(results$reagent_detailed)
## 'data.frame': 57 obs. of 22 variables:
## $ trcn_id : chr "TRCN0000010162" "TRCN0000010163" "TRCN0000010171" "TRCN0000010174" ...
## $ gene_id : int 207 207 207 207 2064 2064 2064 207 207 207 ...
## $ symbol : chr "AKT1" "AKT1" "AKT1" "AKT1" ...
## $ AIC : num 1086 565 1094 1811 1166 ...
## $ logLik : num -536 -275 -540 -899 -576 ...
## $ intercept : num 11.7 11.8 11.4 10.1 11 ...
## $ se_intercept : num 0.0204 0.0158 0.0599 0.0857 0.0298 ...
## $ t_intercept : num 575 748 190 118 370 ...
## $ baseline_trend : num -0.84 -0.68 -0.117 -0.918 -1.12 ...
## $ se_baseline : num 0.0824 0.0542 0.0466 0.1035 0.092 ...
## $ t_baseline : num -10.2 -12.54 -2.51 -8.87 -12.17 ...
## $ difference_her2 : num -0.42 -0.327 0.126 -0.952 -0.792 ...
## $ se_her2 : num 0.22 0.142 0.112 0.282 0.242 ...
## $ t_her2 : num -1.91 -2.31 1.13 -3.38 -3.26 ...
## $ pvalue_intercept: num 0 0 0 0 0 0 0 0 0 0 ...
## $ pvalue_baseline : num 2.08e-24 4.70e-36 1.19e-02 7.45e-19 4.31e-34 ...
## $ pvalue_her2 : num 0.056176 0.020992 0.260278 0.000722 0.001095 ...
## $ fdr_intercept : num 0 0 0 0 0 0 0 0 0 0 ...
## $ fdr_baseline : num 7.42e-24 3.35e-35 1.21e-02 1.93e-18 2.46e-33 ...
## $ fdr_her2 : num 0.09149 0.03739 0.30908 0.00187 0.00271 ...
## $ warnings : chr "" "" "" "" ...
## $ errors : chr "" "" "" "" ...