This website is only tested for google chrome browsers. Consider switching to chrome if you have issues using d::pPop.
Abstract
The design of absolute protein quantification assays remains challenging due to variations in peptide observability.
Here, we present a deep learning algorithm for peptide detectability prediction, d::pPop, which allows the informed selection of synthetic
proteotypic peptides for the design of targeted proteomics quantification assays.
Introduction
System-wide studies to identify and quantify protein components are key to understand complex cellular dynamics in response to system perturbations.
Therefore, mass spectrometry-based proteomics approaches have become an integral element for modern biological research.
In particular, absolute quantification is necessary for biochemical system simulation studies. Therefore, a known amount of synthetic proteotypic
peptides (PTPs) that mimic peptides produced by the proteolytic cleavage of target analyte proteins are spiked into a cell extract.
However, the selection of PTPs suited for absolute quantification of target proteins remains challenging and cumbersome.
Difficulties arise from the large variation in protein specific peptide observability as well as incomplete knowledge about factors influencing peptide detectability.
Method Description
In the context of machine learning, the crucial step to select the positive and negative training data is challenging because it is impossible to
distinguish between not yet observed (measured) or non-observable peptides. Theoretically, peptides derived from the same protein should all show an equal amount within a measurement.
Experimentally obtained deviations from this amount can only be explained by differences in ion species observability.
Therefore, we rank the peptides within the same protein according to their measured abundance and convert the problem to a “learning to rank” problem.
d::pPop is based on a deep neural network which is trained on experimentally observed proteins.
To match the assumption of equal molarity, protein specific peptides are normalized to the peptide with maximum intensity.
Subsequently, feature vectors are computed representing the physiochemical properties of the peptide sequences.
The deep neural network is able to learn a regression model that relates the physiochemical peptide properties to the difference in peptide
intensities within a single protein in the proteomics workflow.
Trained on extensive proteomics datasets, d::pPop’s plant and non-plant specific models can predict the quality of PTPs for not yet experimentally
identified proteins without new prior experimental data (Figure 1).
Figure 1: Schematic overview of the deep learning approach named d::pPop to predict the rank of peptide observability within plant and non-plant specific query proteins.