FC1000: normalized gene expression changes of systematically perturbed human cells
Publication Year 2017-09, Volume 16, Issue #4, Page 217-242
Journal Title
Statistical Applications in Genetics and Molecular Biology
Publication Type
Journal Article
The systematic study of transcriptional responses to genetic and chemical perturbations in human cells is still in its early stages. The largest available dataset to date is the newly released L1000 compendium. With its 1.3 million gene expression profiles of treated human cells it offers many opportunities for biomedical data mining, but also data normalization challenges of new dimensions. We developed a novel and practical approach to obtain accurate estimates of fold change response profiles from L1000, based on the RUV (Remove Unwanted Variation) statistical framework. Extending RUV to a big data setting, we propose an estimation procedure, in which an underlying RUV model is tuned by feedback through dataset specific statistical measures, reflecting p-value distributions and internal gene knockdown controls. Applying these metrics-termed evaluation endpoints - to disjoint data splits and integrating the results to select an optimal normalization, the procedure reduces bias and noise in the L1000 data, which in turn broadens the potential of this resource for pharmacological and functional genomic analyses. Our pipeline and normalization results are distributed as an R package (nelanderlab.org/FC1000.html).
gene expression
WEHI Research Division(s)
PubMed ID
Rights Notice
Refer to copyright notice on published article.

Creation Date: 2018-01-23 10:22:52
Last Modified: 2018-01-25 01:57:48
An error has occurred. This application may no longer respond until reloaded. Reload 🗙