Sequence-Based Protein-Protein Interaction Prediction Using Greedy Layer-Wise Training of Deep Neural Networks
Abstract
Jamu is an herbal medicine commonly used before the advent of modern medicine. Generally, the herbal formula
is obtained empirically and passed down from generation to generation. However, the healing process with herbs is also
influenced by such as myths and local customs. This influence causes differences in the use of herbal ingredients to cure
the same disease. The result is a collection of herbal recipes that overlap each other without any supporting evidence of its
validity. Protein-protein interaction (PPI) is a biological process that is influenced by drugs in the healing process.
Therefore, PPI due to the consumption of herbs can be used as evidence of the effectiveness of herbal medicine. PPI analysis
needs to be done to study how proteins interact with other proteins. PPI analysis with an experimental method (wet lab)
cannot be carried out on extensive data and only covers a portion of protein interaction networks. Therefore, a
computational approach needs to be done. In previous studies, predictions of PPIs were proven to be carried out using only
protein sequence information. The advantage of using this protein sequence information is that this method is more
universal. Information that can be obtained from protein sequences includes Discrete Cosine Transform, Multi-scale Local
Descriptor, Autocovariance, and Conjoint Triad. The study with the sequence information has been done using different
machine learning approaches, such as Support Vector Machines, Random Forest, and Probabilistic Neural Networks. A
deep learning approach has also been done with Stacked-Autoencoder, which tried to construct a hidden structure of protein
sequences. Previously, deep learning has also been proven to be able to handle raw and complex data on a large scale and
learn the useful and abstract features of perceptual problems such as image recognition and voice. The method proposed in
this study is deep neural networks that were trained using stacked-autoencoder and stacked-randomized autoencoder. The
extracted features used are conjoint-triad. This study compares both methods which have different characteristics in the
construction of layers in deep neural networks. We conducted experiments with k-Fold cross-validation which became the
gold standard for most predictive model testing. Our experiments with 5 cross-validations and 3 hidden layers gave an
average validation accuracy of 0.89 ± 0.02 for the SAE method and 0.51 ± 0.003 for the ML-ELM.
Collections
- LSP-Conference Proceeding [1874]