We are proud to announce the latest groundbreaking paper by Sepp Hochreiter team and our IARAI colleagues! The paper “Hopfield Networks is All You Need” introduces a new Hopfield network with continuous states that can store exponentially many patterns and converges within one update. A surprising finding of this work is that the new update rule is the attention mechanism of the Transformer architecture (Attention Is All You Need). Transformers have pushed the performance on the natural language processing (NLP) tasks to new levels via their attention mechanism.
Hopfield networks are associative memory models that are used to store and retrieve patterns. Classical Hopfield networks (Hopfield, 1982) are binary and have a limited storage capacity, with the energy function quadratic in interactions between the neurons. Discrete modern Hopfield networks have been significantly improved in their properties and performance. Dense associative memories (Krotov and Hopfield) updated the energy function with polynomial interactions to increase the storage capacity. The energy function was then extended to exponential interactions (Demircigil et al.), and the update rule, which minimizes the energy function, was shown to converge with high probability after one update of the current state.
In this work, the new energy function generalizes the modern Hopfield networks from discrete to continuous states, while keeping the exponential storage capacity and fast convergence. This model thus allows distinguishing strongly correlated continuous patterns. The authors propose a new update rule for the new energy function that provides global convergence to stationary points of the energy. By showing that this update rule is equivalent to the attention mechanism of Transformers, the paper establishes an important relationship between associative memories and attention. The team has also implemented the Hopfield layer as a standalone module in PyTorch , which can be integrated into deep networks and used as pooling, LSTM, and attention layers, and many more. See the full paper for details and learn more from the official blog post. Follow discussion in the media on the significance of the Hopfield layer and parallels between attention mechanisms and modern Hopfield networks.
[…] interest beyond Transformers as a more scalable replacement for the regular attention. Similar to parallels between Transformers and continuous modern Hopfield networks, the attention module of Performers resembles the update rule of classical Hopfield […]