Günter Klambauer, Ana Sanchez Fernandez, Elisabeth Rumetshofer, and Sepp Hochreiter
Currently, bioimaging databases cannot be queried by chemical structures that induce the phenotypic effects captured by the image. We present a novel retrieval system based on contrastive learning that is able to identify the chemical structure inducing the phenotype out of∼ 2,000 candidates with a top-1 accuracy > 70 times higher than a random baseline. We demonstrate that self-supervised contrastive learning methods can be readily used for multi-modal data arising from informative biotechnologies, such as microscopy images. Our contrastive learning framework CLOOME is used to construct a common embedding space for microscopy images capturing cell phenotypes and chemical structures inducing cellular processes. This enables us to build a content-based retrieval system for microscopy images and chemical structures. Furthermore, the learned microscopy image encoder is shown to produce highly transferable and expressive embeddings, or cell profiles, that can be efficiently used to predict bioactivities or detect new phenotypes. We envision that our work paves the way for retrieval systems for other pairs of modalities, for example querying microscopy imaging databases with transcriptomics signatures or vice versa. The CLOOME search engine as a free web application, the CLOOME framework, the encoders and all code are provided on GitHub.
Research Square, 2022-12-01.