Deep Sequence Models for Ligand-Based Virtual Screening

Citation:
Viswajit Vinod Nair, Jayaraj, Pradeep SP, Nair VS, Pournami PN, Gopakumar G, Jayaraj PB.  2022.  Deep Sequence Models for Ligand-Based Virtual Screening. Journal of Computational Biophysics and Chemistry. 21:207-217., Number 02

Abstract:

The past few years have witnessed machine learning techniques take the limelight in multiple research domains. One such domain that has reaped the benefits of machine learning is computer-aided drug discovery, where the search space for candidate drug molecules is decreased using methods such as virtual screening. Current state-of-the-art sequential neural network models have shown promising results and we would like to replicate similar results with virtual screening using the encoded molecular information known as simplified molecular-input line-entry system (SMILES). Our work includes the use of attention-based sequential models — the long short-term memory with attention and an optimized version of the transformer network specifically designed to deal with SMILES (ChemBERTa). We also propose the “Overall Screening Efficacy”, an averaging metric that aggregates and encapsulates the model performance over multiple datasets. We found an overall improvement of about 27% over the benchmark model, which relied on parallelized random forests.

Notes:

n/a

Related External Link