Andreas Mayr, Günter Klambauer, Thomas Unterthiner, and Sepp Hochreiter

In an earlier published article ‘Large-scale comparison of machine learning methods for drug target prediction on ChEMBL ‘, we compared several machine learning methods for drug target prediction.

In public databases, different chemical compounds are screened by diverse assays to measure bioactivities. Overall, only a few bioactivities are measured by assays for most compounds in the database.  Machine learning techniques can compliment these data by virtually screening compounds for bioactivities. We developed the large-scale comparison (LSC) benchmark dataset for evaluating machine learning methods in predicting bioactivities represented by features, graphs or strings. Using a large dataset from the ChEMBL database, we compared standard deep feed-forward neural networks to Support Vector Machines, Random Forests, graph convolutional networks, and a recurrent neural network.

In this work, we present an overview of the dataset and discuss the common challenges of large-scale studies, including data preprocessing and clustering, and strategies for avoiding potential biases.  We provide the preprocessed parts of the database, a description of our pipeline, and the tools developed for this study.

* AM, GK, and TU contributed equally to this work.

2020-02-12.

Download
View paper
IARAI Authors
Dr Sepp Hochreiter
Research
Drug Discovery
Keywords
Deep Learning, Drug Discovery

©2023 IARAI - INSTITUTE OF ADVANCED RESEARCH IN ARTIFICIAL INTELLIGENCE

Imprint | Privacy Policy

Stay in the know with developments at IARAI

We can let you know if there’s any

updates from the Institute.
You can later also tailor your news feed to specific research areas or keywords (Privacy)
Loading

Log in with your credentials

Forgot your details?

Create Account