Philippe A. Robert, R. Akbar, R. Frank, Milena Pavlović, Michael Widrich, Igor Snapkov, Maria Chernigovskaya, Lonneke Scheffer, Andrei Slabodkin, Brij Bhushan Mehta, Mai Ha Vu, A. Prósz, Krzysztof Abram, Alexandru Olar, Enkelejda Miho, Dag Trygve Tryslew Haug, F. Lund-Johansen, S. Hochreiter, Ingrid Hobæk Haff, G. Klambauer, G. K. Sandve, and V. Greiff
Machine learning (ML) is a key technology to enable accurate prediction of antibody-antigen binding, a prerequisite for in silico vaccine and antibody design. Two orthogonal problems hinder the current application of ML to antibody-specificity prediction and the benchmarking thereof: (i) The lack of a unified formalized mapping of immunological antibody specificity prediction problems into ML notation and (ii) the unavailability of large-scale training datasets. Here, we developed the Absolut! software suite that allows the parameter-based unconstrained generation of synthetic lattice-based 3D-antibody-antigen binding structures with ground-truth access to conformational paratope, epitope, and affinity. We show that Absolut!-generated datasets recapitulate critical biological sequence and structural features that render antibody-antigen binding prediction challenging. To demonstrate the immediate, high-throughput, and large-scale applicability of Absolut!, we have created an online database of 1 billion antibody-antigen structures, the extension of which is only constrained by moderate computational resources. We translated immunological antibody specificity prediction problems into ML tasks and used our database to investigate paratope-epitope binding prediction accuracy as a function of structural information encoding, dataset size, and ML method, which is unfeasible with existing experimental data. Furthermore, we found that in silico investigated conditions, predicted to increase antibody specificity prediction accuracy, align with and extend conclusions drawn from experimental antibody-antigen structural data. In summary, the Absolut! framework enables the development and benchmarking of ML strategies for biotherapeutics discovery and design.
bioRxiv, doi:10.1101/2021.07.06.451258, 2021-07-11.