Neither random nor censored: estimating intensity-dependent probabilities for missing values in label-free proteomics
Author(s)
Li, M; Smyth, GK;
Details
Publication Year 2023-04-17,Volume 39,Issue #5,Page btad200
Journal Title
Bioinformatics
Abstract
MOTIVATION: Mass spectrometry proteomics is a powerful tool in biomedical research but its usefulness is limited by the frequent occurrence of missing values in peptides that cannot be reliably quantified (detected) for particular samples. Many analysis strategies have been proposed for missing values where the discussion often focuses on distinguishing whether values are missing completely at random (MCAR), missing at random (MAR) or missing not at random (MNAR). RESULTS: Statistical models and algorithms are proposed for estimating the detection probabilities and for evaluating how much statistical information can or cannot be recovered from the missing value pattern. The probability that an intensity is detected is shown to be accurately modeled as a logit-linear function of the underlying intensity, showing that missing value process is intermediate between MAR and censoring. The detection probability asymptotes to 100% for high intensities, showing that missing values unrelated to intensity are rare. The rule applies globally to each dataset and is appropriate for both high and lowly expressed peptides. A probability model is developed that allows the distribution of unobserved intensities to be inferred from the observed values. The detection probability model is incorporated into a likelihood-based approach for assessing differential expression and successfully recovers statistical power compared to omitting the missing values from the analysis. By contrast, imputation methods are shown to perform poorly, either reducing statistical power or increasing the false discovery rate to unacceptable levels. AVAILABILITY: Data and code to reproduce the results shown in this article are available from https://mengbo-li.github.io/protDP/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Publisher
Oxford Academic
Keywords
Likelihood Functions; Proteomics; Models, Statistical; Algorithms; Peptides
Research Division(s)
Bioinformatics
PubMed ID
37067487/
Open Access at Publisher's Site
https://doi.org/10.1093/bioinformatics/btad200
Terms of Use/Rights Notice
Refer to copyright notice on published article.


Creation Date: 2023-05-01 02:16:42
Last Modified: 2023-06-13 01:17:47
An error has occurred. This application may no longer respond until reloaded. Reload 🗙