Hubert Ramsauer, Bernhard Schäfl, Johannes Lehner, Philipp Seidl, Michael Widrich, Lukas Gruber, Markus Holzleitner, Milena Pavlović, Geir Kjetil Sandve, Victor Greiff, David Kreil, Michael Kopp, Günter Klambauer, Johannes Brandstetter, and Sepp Hochreiter
We introduce a new Hopfield network that can store exponentially many patterns (in fixed dimensions) and has a very fast convergence. The proposed model generalizes modern Hopfield networks from binary to continuous patterns. We show that the update rule of the new Hopfield network is equivalent to the attention mechanism of the Transformer architecture. Via the attention mechanism, Transformers, in particular the Bidirectional Encoder Representations from Transformers (BERT) model, have significantly improved performance on the natural language processing (NLP) tasks.
We suggest using modern Hopfield networks to store information in neural networks. Modern binary Hopfield networks have an exponential storage capacity (with the dimension of the vectors representing the patterns). We extend the energy function to include continuous patterns by taking the logarithm of the negative energy and adding a quadratic term of the current state. The latter ensures that the norm of the state vector remains finite and the energy is bounded. We propose a new update rule for the new energy function that provides global convergence to stationary points of the energy. We then prove that a pattern, substantially separated from other patterns, can be retrieved with one update step with an exponentially small error.
Surprisingly, our new update rule is also the key-value attention softmax-update used in Transformers and BERT. Using these insights, we modify the Transformer and BERT architectures to make them more efficient in learning and to obtain higher performances.
We implemented the new Hopfield layer as a standalone module in PyTorch, which can be integrated into deep learning architectures as pooling and attention layers. We show that neural networks with Hopfield layers outperform other methods on immune repertoire classification, allowing to store several hundreds of thousands of patterns.