Noé Sturm, Andreas Mayr, Thanh Le Van, Vladimir Chupakhin, Hugo Ceulemans, Joerg Wegner, Jose-Felipe Golib-Dzib, Nina Jeliazkova, Yves Vandriessche, Stanislav Böhm, Vojtech Cima, Jan Martinovic, Nigel Greene, Tom Vander Aa, Thomas J Ashby, Sepp Hochreiter, Ola Engkvist, Günter Klambauer, and Hongming Chen

In this work, we investigated the transferability of machine learning models trained on publicly available data to private pharmaceutical industry data. Public and private datasets may differ substantially due to differences in measurement techniques, sample quantity, and assay specialization. We focused on machine learning models for drug target prediction, trained them on public data, transferred to industrial data, and evaluated their performance. For training the models, we used a benchmark dataset for drug discovery ExCAPE-DB. This dataset is extracted from ChEMBL and PubChem public databases, containing bioactivity data for small molecules, including protein-ligand activity. We considered several established machine learning methods: feed-forward fully connected deep neural networks (DNNs), gradient boosting as an ensemble-based approach, and Bayesian matrix factorization approach. The performance of these methods was evaluated on industrial datasets from AstraZeneca and Janssen databases. The main performance measure was the area under the receiver operating characteristic (ROC) curve, which reflects the model’s ability to rank active compounds higher than inactive compounds. Our results show that machine learning models trained on public data show only a small decrease in performance when applied to industry data.

Journal of Cheminformatics, 12, 1-13, 2020-04-19.

View paper
IARAI Authors
Dr Sepp Hochreiter
Health and Well-being
Deep Learning, Drug Discovery


Imprint | Privacy Policy

Stay in the know with developments at IARAI

We can let you know if there’s any

updates from the Institute.
You can later also tailor your news feed to specific research areas or keywords (Privacy)

Log in with your credentials

Forgot your details?

Create Account