Hubert Ramsauer, Bernhard Schäfl, Johannes Lehner, Philipp Seidl, Michael Widrich, Lukas Gruber, Markus Holzleitner, Milena Pavlović, Geir Kjetil Sandve, Victor Greiff, David Kreil, Michael Kopp, Günter Klambauer, Johannes Brandstetter, and Sepp Hochreiter

Modern Hopfield networks
The image describes relationships between the energy of the modern binary Hopfield networks, the energy of the new Hopfield network with continuous states, the new update rule minimizing the energy, and the attention mechanism of Transformers.

We introduce a new Hopfield network that can store exponentially many patterns (in fixed dimensions) and has a very fast convergence. The proposed model generalizes modern Hopfield networks from binary to continuous patterns. We show that the update rule of the new Hopfield network is equivalent to the attention mechanism of the Transformer architecture. Via the attention mechanism, Transformers, in particular the Bidirectional Encoder Representations from Transformers (BERT) model, have significantly improved performance on the natural language processing (NLP) tasks.

We suggest using modern Hopfield networks to store information in neural networks. Modern binary Hopfield networks have an exponential storage capacity (with the dimension of the vectors representing the patterns). We extend the energy function to include continuous patterns by taking the logarithm of the negative energy and adding a quadratic term of the current state. The latter ensures that the norm of the state vector remains finite and the energy is bounded. We propose a new update rule for the new energy function that provides global convergence to stationary points of the energy. We then prove that a pattern, substantially separated from other patterns, can be retrieved with one update step with an exponentially small error.

Surprisingly, our new update rule is also the key-value attention softmax-update used in Transformers and BERT.  Using these insights, we modify the Transformer and BERT architectures to make them more efficient in learning and to obtain higher performances.

The new Hopfield layer is implemented as a standalone module in PyTorch, which can be integrated into deep learning architectures as pooling and attention layers. We show that neural networks with Hopfield layers outperform other methods on immune repertoire classification, allowing to store several hundreds of thousands of patterns.


arXiv:2008.02217, 2020-08-06

View paper
IARAI Authors
Dr David Kreil, Dr Michael Kopp​, Dr Sepp Hochreiter
Hopfield Networks
Attention Mechanism, Deep Learning, Hopfield Networks, Transformer


Imprint | Privacy Policy

Stay in the know with developments at IARAI

We can let you know if there’s any

updates from the Institute.
You can later also tailor your news feed to specific research areas or keywords (Privacy)

Log in with your credentials


Forgot your details?

Create Account