Description |
RNA-sequencing (RNA-seq) is a powerful experimental technique to study biological systems. However, the processing and analysis of RNA-seq data remain challenging. Among the challenges, the problem of between-sample normalization has been identified and addressed for more than a decade, resulting in dozens of normalizers, yet with little consensus in how their performance should be measured. In addition, most contemporary normalizers rely on assumptions that have become outdated with the expansion of RNA-seq data analysis. In this dissertation, I contributed to the improvement of RNA-seq normalization in three ways. First, I proposed a ground-truth based metric to assess normalizer performance, and provided an extensive collection of experimental data sets to serve as a benchmark. Second, using this benchmarking toolset, I explored the effects of normalization on downstream analysis in a more systematic manner. Third, I introduced a new normalization method that overcomes the reliance on outdated assumptions of the existing ones |