Antony, JV, Koya R, Pournami PN, Nair GG, Balakrishnan JP.
2022.
Protein secondary structure assignment using residual networks, August. Journal of molecular modeling. 28:269., Number 9
AbstractProteins are constructed from amino acid sequences. Their structural classifications include primary, secondary, tertiary, and quaternary, with tertiary and quaternary structures influencing protein function. Because a protein's structure is inextricably connected to its biological function, machine learning algorithms that can better anticipate the structures have the potential to lead to new scientific discoveries in human health and improve our capacity to develop new treatments. Protein secondary structure assignment enriches the structural and functional understanding of proteins. It helps in protein structure comparison and classification studies, besides facilitating secondary and tertiary structure prediction systems. Several secondary structure assignment methods have been developed since the 1980s, most of which are based on hydrogen bond analysis and atomic coordinate features. However, the assignment process becomes complex when protein data includes missing atoms. Deep neural networks are often referred to as universal function approximators because they can approximate any function to produce the desired output when properly designed and trained. Optimised deep learning architectures have already proven their ability to increase performance in a wide range of problems. Recently, the ResNet architecture has garnered significant interest due to its applicability in various areas, including image classification and protein contact map prediction. The proposed model, which is based on the ResNet architecture, assigns secondary structures using Cα atom coordinates. The model achieved an accuracy of 94% when evaluated against the benchmark and independent test sets. The findings encourage the development of new deep learning-based methods that are more generalised across various protein learning tasks. Furthermore, it allows computational biologists to delve deeper into integrating these techniques with experimental methods. The model codes are available at: https://github.com/jisnava/ResNet_for_Structure_Assignments/ .
Viswajit Vinod Nair, Jayaraj, Pradeep SP, Nair VS, Pournami PN, Gopakumar G, Jayaraj PB.
2022.
Deep Sequence Models for Ligand-Based Virtual Screening. Journal of Computational Biophysics and Chemistry. 21:207-217., Number 02
AbstractThe past few years have witnessed machine learning techniques take the limelight in multiple research domains. One such domain that has reaped the benefits of machine learning is computer-aided drug discovery, where the search space for candidate drug molecules is decreased using methods such as virtual screening. Current state-of-the-art sequential neural network models have shown promising results and we would like to replicate similar results with virtual screening using the encoded molecular information known as simplified molecular-input line-entry system (SMILES). Our work includes the use of attention-based sequential models — the long short-term memory with attention and an optimized version of the transformer network specifically designed to deal with SMILES (ChemBERTa). We also propose the “Overall Screening Efficacy”, an averaging metric that aggregates and encapsulates the model performance over multiple datasets. We found an overall improvement of about 27% over the benchmark model, which relied on parallelized random forests.