Cynthia Shen, Mario Krenn, Sagi Eppel, and Alan Aspuru-Guzik

Training for PASITHEA

Two-step training for PASITHEA.

Computer-based de-novo molecular design is a major a challenge in cheminformatics today. Numerous AI methodologies for inverse molecular design aim to optimize molecules for a particular chemical property. Here, we introduce PASITHEA, a generative model for molecules that applies inceptionism techniques from computer vision. Inceptionism allows visualizing the role of a specific layer of a neural network by reversing the network classification training and maximizing the layer activation. We generalize the inceptionism technique for inverse molecular design: the inverse training focuses on the output layer, which contains the predicted chemical property.

PASITHEA is a gradient-based technique that optimizes molecular structure for a target property. First, we train the neural network on a random subset of QM9 dataset to predict the logarithm of the partition coefficient. The partition coefficient measures lipophilicity and is an important property of drug molecules. We use the string molecular representation SELFIES, which enforces a continuous sequence mapping to valid molecules. The training involves standard feedforward and backpropagation process; the network iteratively improves its predictions through mini-batch gradient descent. Then, we invert the training of the network to generate new variants of molecules in a process referred to as deep dreaming. In this process, the weights and biases of the network layers are fixed, while the input molecule is gradually modified to match the target property value. The error between the predicted value and the target value is minimized through backpropagation and used to compute the gradient with respect to molecular encoding.

We expect that extending PASITHEA to larger datasets, molecules and more complex properties will lead to advances in the design of new molecules as well as the interpretation and explanation of machine learning models. The full code, along with the model, is available on GitHub.

Machine Learning: Science and Technology, 2, 3, 03LT02, 2021-07-13.

View paper
IARAI Authors
Mario Krenn
Deep Generative Models, Drug Discovery
Drug Discovery, Generative Models for Molecules, Inverse Design, Machine Learning


Imprint | Privacy Policy

Stay in the know with developments at IARAI

We can let you know if there’s any

updates from the Institute.
You can later also tailor your news feed to specific research areas or keywords (Privacy)

Log in with your credentials

Forgot your details?

Create Account