The hottest thing in AI is something called RAG

Robert Michael/dpa-Zentralbild/ZB via Getty Images

Hello and welcome to the October special edition of Eye on A.I.

If you've been following AI chatter online or stopped by a recent AI conference, chances are you probably heard everyone talking about RAG.

RAG, or retrieval-augmented generation, is an emerging technique for giving an existing AI model new information in order to have it perform a specific task. Whereas fine-tuning involves actually adapting an existing model using new data, resulting in a derivative model, RAG is simply giving a model a set of information it was never trained on and asking it to consider that information in its response. With RAG, you’re not changing the model, just asking it to temporarily reference external data in order to complete your specific request. It “forgets” the information right away.

“In the simplest version, I retrieve from my internal database some documents or some text that I want, and I send it to the model along with my ask: ‘Answer this question’ or ‘Summarize this.’ The model does it. The model has not changed. I go do another RAG with a new ask and it keeps going,” Sriram Raghavan, vice president of IBM Research AI, told Eye on AI. “The reason it’s very popular is because it’s simple.”

This simpler approach to working with existing LLMs is only possible because of the sheer power of the latest models. Raghavan estimates this technique started rising around six to nine months ago, trailing the release of ChatGPT by a quarter or two. It’s incredibly useful for anyone who wants to build an application using an existing large language model, and it’s easy to see why it’s catching on.

Fine-tuning and even prompt engineering, in which users structure and refine text prompts to elicit specific responses from LLMs, are time-consuming and add a lot of complexity. And while fine-tuning typically requires you to provide the model with hundreds or thousands of examples, RAG calls for just a document or two, maybe a few dozen examples. Of course, many more complex tasks will continue to require the more intensive process of fine-tuning, but RAG provides an easy way to supercharge a leading LLM to perform a wide variety of tasks with more recent data, domain-specific data, and even proprietary data.

“I want to leverage the fact that the model is good at language, knows how to work with it,” Raghavan said. “I want to leverage that, but I want to do it on my data.”

But RAG isn’t a silver bullet. As models get better and better, retrieval is the hard part, according to Raghavan. You need to give the model the correct input, which might mean scouring a large set of documents to narrow it down to the most relevant ones, breaking up documents, or because LLMs can only read text, reformatting more complex PDF documents that contain tables, charts, and graphs.

IBM is currently working on offerings targeted specifically at helping application developers use RAG. For example, they’re creating patterns and “cookbooks” that offer recipes for how to do RAG according to what type of application you want to develop. Microsoft, Google, and Amazon also have their own RAG solutions. Companies want to be able to use the power of LLMs on their own data, which means RAG has a significant role to play in the enterprise in particular.

And with that, here’s more AI news.

Sage Lazzaro
sage.lazzaro@consultant.fortune.com
sagelazzaro.com

This story was originally featured on Fortune.com

Advertisement