Cynthia Shen, Mario Krenn, Sagi Eppel, and Alan Aspuru-Guzik
Computer-based de-novo molecular design is a major a challenge in cheminformatics today. Numerous AI methodologies for inverse molecular design aim to optimize molecules for a particular chemical property. Here, we introduce PASITHEA, a generative model for molecules that applies inceptionism techniques from computer vision. Inceptionism allows visualizing the role of a specific layer of a neural network by reversing the network classification training and maximizing the layer activation. We generalize the inceptionism technique for inverse molecular design: the inverse training focuses on the output layer, which contains the predicted chemical property.
PASITHEA is a gradient-based technique that optimizes molecular structure for a target property. First, we train the neural network on a random subset of QM9 dataset to predict the logarithm of the partition coefficient. The partition coefficient measures lipophilicity and is an important property of drug molecules. We use the string molecular representation SELFIES, which enforces a continuous sequence mapping to valid molecules. The training involves standard feedforward and backpropagation process; the network iteratively improves its predictions through mini-batch gradient descent. Then, we invert the training of the network to generate new variants of molecules in a process referred to as deep dreaming. In this process, the weights and biases of the network layers are fixed, while the input molecule is gradually modified to match the target property value. The error between the predicted value and the target value is minimized through backpropagation and used to compute the gradient with respect to molecular encoding.
We expect that extending PASITHEA to larger datasets, molecules and more complex properties will lead to advances in the design of new molecules as well as the interpretation and explanation of machine learning models. The full code, along with the model, is available on GitHub.
Machine Learning: Science and Technology, 2, 3, 03LT02, 2021-07-13.