Want to add infinite text into LLMs? Google just made it easier with the Infini-attention technique!

Sayali Shelke
2 min readApr 17, 2024
Photo by Andrew Neel on Unsplash

Large Language Models (LLMs) are AI algorithms trained on massive amounts of text data. One of the most widely used applications of LLM is Generative AI, which is available to the public in the form of Open AI’s GPT-3, Claude, Llama 2, Github’s Co-pilot, and more. These models/chatbots answer queries in an informative way in a few seconds.

However, the LLMs we use today have limitations. They can only work on limited text input and memory. Typical transformers reset their attention memory after each context window, losing the previous context. But, Google recently announced that developers can now add an infinite amount of text to LLMs. This opened up copious opportunities for tech companies and users.

Since Context Window is the hero here. It plays a significant role as all popular AI models have limited text input. The more input is provided, we can get closer to the desired output. Therefore, the main goal of LLM developers is to increase the number of token inputs.

By enlarging the context window, the model can retain and utilize more information from previous parts of the conversation, leading to responses that are more accurate and contextually relevant. This advancement aims to enhance user interactions, making them feel more natural and immersive.

Fig: Figure 2: Infini-Transformer (top) has an entire context history whereas Transformer-XL (bottom) discards old contexts

The research unveiled by Google focuses on the following:

  • Chunking and Attention: Infini-attention partitions the input sequence into smaller segments and employs an attention mechanism to identify relevant portions within each chunk. This mechanism assigns weights to elements within the chunk, signifying their significance in the current context.
  • Memory upgrade: Maintains a steady memory usage regardless of the length of the input sequence.
  • Computational Efficiency: Minimizes computational requirements compared to traditional methods.
  • Scalability: Capable of handling extremely long sequences without needing to be retrained from the beginning.

That’s an exciting development! While Infini-attention is currently being researched, its potential to boost LLM performance is quite promising. Many in the industry will closely watch to see if this technique gets integrated into mainstream AI systems.

The rapid pace of advancements in AI makes it interesting to see how new methods and technologies evolve over time.

--

--