Generative AI is great, but it lacks precision and can be very wrong
Since ChatGPT was introduced in November 2022, the subject of AI and its possibilities has gained significant attention in business and technology sectors. Our previously published articles and insights have explored both the potential business value of various AI types and the challenges associated with generative AI.
Generative AI, particularly Large Language Models (LLMs), can assist organizations with daily tasks and enhance product offerings by producing human-like text and answering general knowledge questions. Having been trained on large sets of data written by humans, these models perform rather well at these tasks. However, challenges arise when LLMs are supposed to handle queries beyond publicly available information or capture events after the data cut-off date. In addition to that, the assertive tone in LLM-generated text can be convincing for users despite the generated text being false, this is usually called hallucination.
In this article, we will explore the concept of Retrieval-Augmented Generation (RAG) as a method to address these challenges with LLMs. We start with an overview of different methods to address the challenges with LLMs before looking at what RAG is and how it works. We conclude by looking at the benefits RAG can bring to an organization.
Which methods can be used to get more precise and reliable results?
To improve the output of the LLM and address these challenges, several approaches can be employed. It’s worth bearing in mind that no single technique is universally superior; the best approach depends on the specific context.
A commonly suggested way of increasing the accuracy of an LLM would be prompt-engineering. For some cases the interactions and phrasing of the questions may result in better output, however the output still relies on the data it was trained on. As a tool prompt engineering is a key capability within generative AI, but it does not solve the issue related to non-public data or data that was created after the cut-off date.
Both custom model and fine-tuned models will perform better for queries requiring specific data. There are however two major drawbacks: they are costly and the time gap between training and user prompting remains.
Neither custom models, fine-tuned models nor unmodified LLM models can provide text that is based on information that has been produced since the latest update of the underlying model. In these models the issue of being limited by the cut-off date remains, but if accuracy of the output is not dependent on recent information or the information may change slowly, these models may provide value for the user.
Between prompt engineering and custom or fine-tuned models is an approach which does not rely on training a model but still provides output based on data for a specific use case. This approach is called retrieval augmented generation, RAG. It is a combination of different types of AI where one part consists of AI technologies to retrieve your data that is relevant to the prompt, the retriever, and the second part is generative AI, the generator.
What is RAG?
When feeding the LLM with data through the prompt, a basic approach would be to input a section of text into a standard prompt with some prompt engineering. This provides context for the question the user wants the AI-generated text to be based on.
The essence of RAG takes this basic approach a step further. In a RAG method, the prompt is prepared with the context that has been retrieved from your own data source. Afterwards, the LLM answers the user question based on this prompt.
The idea of a retriever is getting a section of relevant text from your own source data in text format. To achieve this the text is converted into numerical format, capturing its meaning. This type of format allows comparison of different text sections based on their meanings. Both reference text from source documents, and user question undergo this conversion. The purpose of the comparison procedure is to select the section(s) of text from the source documents that are closest in meaning to the text in the user question.
An example where we have seen RAG being used is in chatbots for product support. It is common for product user guides and manuals to be kept in a knowledge/data base. When a user needs support, through a RAG app, the user query would be used to search for relevant information in the user manuals or product documentation. Information would be retrieved first, then provided to the prompt for the LLM to give the user advice on how to use the product.
The architecture above and the example described represent a basic form of RAG. There are various methods of creating RAG applications for different purposes, these methods vary in complexity. Generally, the more complex methods of RAG will give more accuracy at the cost of effort to create and manage the RAG application. Conversely, the simpler approaches will be less accurate but easier to create and manage.
What does this mean in the world of utilizing generative AI?
More trusted AI applications
RAG would make it possible to get context-specific output from LLMs based on our non-public data and information, by providing the LLM with this data and information as context in the prompt. It lowers the risk of hallucinations, and we should also be able to trust the output from the AI model to a higher degree than we normally can. We should, if we manage the knowledge base, know where the input to the context for the query comes from.
Accuracy and precision from RAG
In the bigger picture, RAG offers a way to get more precise output from generative AI using our own data. It combines the strength of text generation in a human-like way from generative AI and other AI technologies to search for and retrieve relevant information. With this approach, RAG applications leverage different forms of AI specialized for different tasks, such as writing human like text or relevant finding text. This enables higher accuracy in the output than an LLM alone can produce.
Adaptability and flexibility from RAG
RAG offers a relatively flexible knowledge base where the model does not need retraining or fine-tuning to be able to adapt when the context changes. It is still a less flexible way to work with generative AI than prompt engineering, because RAG still requires a knowledge base and will generate answers related to information in the knowledge base. With proper management of the knowledge base, we should be able to maintain and update the databases that are used as knowledge sources. While this is something that requires effort, if we want to use generative AI when there is a requirement for information to be recent and provide insights on enterprise data, then having a RAG app with a maintained knowledge base is a possibility to get this up-to-date information.
Strengthen your data science and data management capabilities
RAG has existed as a methodology since the early days of generative AI. It offers a way to increase the accuracy and timeliness of information by leveraging the strengths of various AI models currently on the market. Additionally, this allows us to use enterprise or context specific documents to provide useful information to the generative AI model. However, the cost here is that in many cases there is a need for data scientists and data engineers to set up and maintain the RAG app, including the knowledge base and retriever. While this overview might give the impression that building a RAG application is easy, several considerations need to be carefully done to create a good RAG app. Once RAG is explored deeper, there are several ways to go about building the app and optimizing the performance for the specific use case. An organization starting with RAG needs to understand its use cases and data needs, have proper data management practices in place and establish and maintain AI and data science capabilities.
At Opticos we enable organizations to leverage the business benefits of new emerging technologies. Drawing from our extensive client experience and methodology in business strategy, change management, data management and AI governance, we’re here to support you in your AI journey from strategy to implementation. Through our strategic partnership with Algorithma, a company offering data science and hosting services for AI, we provide end-to-end AI capabilities to our clients.
Write to us to discuss your organizational AI goals.