Online Program

326018
Evaluation of statistical methods for differential expression analysis of RNA-seq data with paired data design


Tuesday, November 3, 2015 : 4:45 p.m. - 5:00 p.m.

Fang Qiu, Department of Biostatistics, Unviersity of Nebraska Medical Center, Omaha, NE
Fang Yu, PhD, Department of Biostatistics, University of Nebraska Medical Center, Omaha, NE
Jane Meza, PhD, Department of Biostatistics, University of Nebraska Medical Center, Omaha, NE
Background: Recently, next generation sequencing has emerged as a powerful tool for gene expression profiling. Previous comparisons of statistical methods for RNA-seq data analysis were focused on data with two independent conditions. However, more complex experimental designs such as paired-data design are very common in RNA-seq experiments. Timely evaluations of innovative statistical methods for RNA-seq data with paired design are important and necessary in practical applications.

Methods: We assess the performance of six statistical methods including edgeR, DEseq, SAMseq, pairedBayes, baySeq, and TSPM for identifying differentially expressed (DE) genes from RNA-seq data with paired design. Via simulation studies with different sample sizes, competing methods are evaluated based on their classification accuracy via mean receiver operating characteristic (ROC) curves, predictive power via plot between positive predictive value (PPV) and 1- negative predictive value (1-NPV), and ability of controlling false discovery rate (FDR).

Results: Based on the ROC and predictive power curves, edgeR and DEseq perform well under all settings when compared to other methods. Additionally, Bayseq works well when sample size is small (n=3), and TSPM and SAMseq perform well under other settings with larger sample size. baySeq well controls the FDR under all settings while pairedBayes selects largest number of DE genes and has the worst FDR control. DESeq, SAMseq, and TSPM methods controls FDR better for studies with a large sample size (n=10).

Conclusions: Our results are useful in guiding the choice of statistical methods in detecting DE genes of RNA-seq data with a paired design.

Learning Areas:

Biostatistics, economics
Conduct evaluation related to programs, research, and other areas of practice

Learning Objectives:
Evaluate statistical methods for differential expression analysis of RNA-seq data with paired data design

Keyword(s): Biostatistics

Presenting author's disclosure statement:

Qualified on the content I am responsible for because: I participate in simulation study design, data analysis and draft the abstract. I am a co-author of articles published on the following topics: DNA sequence-based "bar codes" for tracking the origins of expressed sequence tags from a maize cDNA library; Laser-capture microdissection, a tool for the global analysis of gene expression in specific plant cell types; Greedy Closure Genetic Algorithms; use of DNA microarrays for the developmental expression analysis of cDNAs.
Any relevant financial relationships? No

I agree to comply with the American Public Health Association Conflict of Interest and Commercial Support Guidelines, and to disclose to the participants any off-label or experimental uses of a commercial product or service discussed in my presentation.