FC1000: normalized gene expression changes of systematically perturbed human cells
Publication Year 2017-09, Volume 16, Issue #4, Page 217-242
- Journal Title
- Statistical Applications in Genetics and Molecular Biology
- Publication Type
- Journal Article
- The systematic study of transcriptional responses to genetic and chemical perturbations in human cells is still in its early stages. The largest available dataset to date is the newly released L1000 compendium. With its 1.3 million gene expression profiles of treated human cells it offers many opportunities for biomedical data mining, but also data normalization challenges of new dimensions. We developed a novel and practical approach to obtain accurate estimates of fold change response profiles from L1000, based on the RUV (Remove Unwanted Variation) statistical framework. Extending RUV to a big data setting, we propose an estimation procedure, in which an underlying RUV model is tuned by feedback through dataset specific statistical measures, reflecting p-value distributions and internal gene knockdown controls. Applying these metrics-termed evaluation endpoints - to disjoint data splits and integrating the results to select an optimal normalization, the procedure reduces bias and noise in the L1000 data, which in turn broadens the potential of this resource for pharmacological and functional genomic analyses. Our pipeline and normalization results are distributed as an R package (nelanderlab.org/FC1000.html).
- gene expression
- WEHI Research Division(s)
- PubMed ID
- Publisher's Version
- Open Access at Publisher's Site
- Rights Notice
- Refer to copyright notice on published article.
Creation Date: 2018-01-23 10:22:52Last Modified: 2018-01-25 01:57:48