CNME Editor Mark Forker spoke to Ashley Woodbridge, Field CTO, Infrastructure Solutions Group, Middle East, Turkey and Africa, at Lenovo, in a bid to better understand how Retrieval Augmented Generation is empowering businesses with access to better data and is bringing large public LLMs safely into the enterprise space.
Ashley Woodbridge has enjoyed a phenomenal career in ICT.
A career that has spanned almost 25 years, and one which has seen him work for technology behemoths such as Cisco, Google, SAP, Amazon Web Services and now Lenovo.
A quick scan through the recommendations tab on his LinkedIn profile clearly indicated the high regard and esteem he is held in by his peers.
Described as a world-class manager and an innovative leader, Woodbridge was appointed as Lenovo’s Field CTO, Infrastructure Solutions Group, Middle East, Turkey and Africa in October 2022.
Woodbridge has been in the Middle East region since 2007, and during that time has seen vast technological changes across the industry.
The topic of AI remains at the forefront for more organisations across the global IT ecosystem, and the latest concept to emerge within the field of AI is Retrieval Augmented Generation (RAG).
Woodbridge is someone that has always possesses the ability to have an in-depth understanding and knowledge of all the new tech innovations and trends that drive change.
So, there is arguably nobody better to talk about Retrieval Augmented Generation than the charismatic Australian ICT.
Woodbridge kickstarted our conversation on RAG by providing more context on the LLM landscape overall.
“To put it into context, one of the things about Large Language Models (LLM) is the fact that we’ve cracked the code essentially on the compression method. Its knowledge comes from processing all of the data from across the entire internet, which contains trillions and trillions of data sources, and that provides us with access to amazing intelligence that also enables us to retrieve facts. However, the real issue comes from the fact that the knowledge is only updated from when they did the training run. For example, when you go on ChatGPT, you’ll see a little prompt in the corner that will say this data is valid from October 2023, so that’s inevitably led to a huge problem as we want to push this into the enterprise, or for start-ups looking to differentiate their services,” said Woodbridge.
As Woodbridge highlighted, one of the biggest areas to differentiate is to get access to real-time knowledge, and a lot of that knowledge is not in the public domain.
“The LLM has done a great job of trolling applications like Reddit and Facebook, but when it comes to medical journals, or financial reporting a lot of this material is behind paywalls, so that information couldn’t go into that training set. In order to find a way around this, the smart boffins came up with the concept of retrieval augmented generation. What that essentially means is that you add a step before you hand it to the LLM, and augment it to get more pertinent information, and that is spread out in several different ways. The initial thought process was let’s have a database that has my information in it, information that wasn’t previously readily available, and was behind a paywall and get more up to date information, you check in that database and then the information from that database goes into the more traditional Large Language Models, and this has had a number of benefits,” said Woodbridge.
According to Woodbridge, one of the first things you can do is make something like ChatGPT much smarter.
“People are using it for its ability to converse in English and write very creative long-winded responses, but you can now feed it with your own private data that it previously wasn’t able to learn on, and now you’re getting much better responses. It has made the information faster, but it has also significantly helped to stop hallucinations, because you’re now in a position to give it pointed information. For example, if a user went and said is the SR 685 the best platform in the world, the 685 is a platform that wasn’t released when ChatGPT was trained, so it would come up with this absolute fantasy response claiming that it’s the best platform that Lenovo has ever had. However, that only serves to cause issues with our brand because you’re talking about our product that isn’t released. Now by wrapping that in a RAG where we are able to provide all of our datasheets and support documents then it’s now going to give you the correct answer. So, for a lot of enterprises, a cheap hack has been the ability to use one of these, let’s call it a freely available public Large Language Model and wrap some guidelines around it through the use of RAG to give you much more responsive answers. It really has been a great hack to bring the public models safely into the enterprise space,” said Woodbridge.
In terms of the limitations and challenges presented by RAG, Woodbridge highlighted that the two-step model in the initial RAGS lacked enough due diligence.
“The initial rags were basically a two-step model, so you used an intermediate AI model that was very good at what we call vectorising. If you had a large amount of data then you would feed it to the pre-step model, who would subsequently turn all of that data into a matrix of numbers. This essentially meant that when the first question was submitted it would go and look it up by leveraging smart maths, and would find what documents were relative because they score a very similar weighting in terms of when you turn them into a vector. However, the issue that we sensed is that initial step when done well gives amazing results. However, what we found was that there wasn’t enough due diligence done on those initial steps, so if that initial step provides garbage data, then now you’ve just taught the large language model that the augmented step is the source of truth, but it actually makes the hallucinations even worse if there’s bad data in it,” said Woodbridge.
Woodbridge said one of the other benefits of the increased adoption of RAG has its extension beyond vectorising.
“The good thing that has happened is that the concept of RAG has been extended beyond just this vector database, you can see this in terms of what ChatGPT have done with their private GPTs, where we can all make our own customised GPTs, which they call GPTs, but they are essentially RAGS. It is their interpretation of RAGS, but we’re seeing the trend now to not necessarily use a data source, but instead integrate it into that retrieval set in order to get access to an API. You don’t need that vectorisation step, and you’re now not risking that the source data could be inaccurate, or stale. You’re now able to get all the benefits of a RAG using a real-time API that can now overcome the complexity of having a two-step model,” said Woodbridge.
In terms of the mechanics behind all of this, Woodbridge explained the importance of the system prompt.
“How we share the information when using RAGS is called padding, or embedding the system prompt. So, in terms of the API you basically dump all of the context into the system prompt. The user doesn’t see it, but when the user asks a question it’s able to use the system prompt to be able to provide much better context. Now the big limitation here is a lot of the large language models have very small system prompts, and it’s not that long ago that the maximum number of tokens you could give to Open AI was 8K. So, essentially what that meant was all of that knowledge you had to share had to fit in roughly 8,000 words. As you can imagine that’s still not giving enough data for a lot of use cases that need that true intelligence and real-time availability, which is what we are all striving for,” said Woodbridge.
Woodbridge pointed out that this is the primary factor behind an increase in announcements being made by big technology companies.
“We’ve seen the likes of Google in relation to Gemini, who are now trying to do a million token input sizes, and we’ve also seen a lot of the most cutting-edge public LLMs are targeting aving unlimited context size because they now recognise and understand that there has been this move to RAGS. We’ve now reached this stage where everyone is hoping for bigger context sizes, but bigger context sizes are not cheap, so that has now led to a bit of a war between how we can do cost-effective AI, and how we can achieve the ultimate goal of masking this as close as possible to AGI,” said Woodbridge.
Woodbridge also revealed that Lenovo is working more with B2B customers when it comes to LLMs.
“The perceived intelligence goes up with the number of parameters that you can input into the model. The benchmark at the moment is 70 billion and above in terms of parameters and that is where you start to get comparative answers to whatever the cutting-edge public model is, whether it is Gemini, or ChatGPT. At Lenovo, we’re working a lot with enterprises, not so much in the B2C space, but instead B2B. You’re now able to get ChatGPT 3.5, or ChatGPT 4 like answers from a 70 billion parameter model because you’re now able to give more context that only would be achievable if you did it without a RAG by having a 150, or 200 billion model. What that actually translates to is something like Llama 3, which is very popular in the enterprise space, and that can be done on a single server in a very cost effective way. You can now deliver to the enterprise answers that are relevant to their customer base and relevant to their internal use case from a single server – and the end-user would not be differentiate from the quality of the response that they would get from the largest public LLM,” said Woodbridge.
Woodbridge added that enterprises were now getting the best of ‘both world’s’.
“Many enterprises have adopted a wait and see response, and as a result have not been able to leverage all the benefits of LLMs. However, with the advent of RAGS and the Open-Source community providing great models like Llama 3 and the Falcon 70B, which is big in this region because of its Arabic capabilities. Enterprises are getting the best of both worlds, which is high-quality responses that show they are not lagging the latest technology, and have the control, security and governance that they both need and want before they launch a service,” said Woodbridge.
In terms of some ethical concerns around RAGS, Woodbridge said it wasn’t a silver bullet, but said it definitely was helping in terms of eliminating some biases that exist.
“It’s not a silver bullet, but it certainly is helping to debias things because again, rightly, or wrongly when most of the training data for these very successful public models is freely available, and open on the internet. However, that doesn’t always necessarily mean it is always is representative of the community and social values. What the large public LLMs have done is put a lot of alignment in terms of safeguards, and safeguards have been in place for some of the bad things that may have been learned. For example, if you ask it to write a paragraph of code then it will give you 7 paragraphs of all of the ethical considerations, and outline all the reasons why you need to be careful of copyright before it actually writes the three lines of code. So, on one side of things they are solving the alignment problem, but they’re solving it with a sledgehammer,” said Woodbridge.
Woodbridge concluded a wonderful discussion by reemphasizing the benefits of RAG.
“What RAG is able to do is give enterprises the ability to provide augmented data, which aligns with their values and mission statement. RAG is helping from the point that it can never untrain the models for the things it learned, but it helps you to be able to build a better alignment based off more up-to-date information and that is aligned with your social causes. It is also allowing you to use smaller models that are more energy efficient and sustainable, and don’t have the same dependency on training themselves on all of the internet. What you’re seeing now is enterprises that have the means, capital and skills to build a 7 billion parameter model that is only trained off their choice of data, and then they are able to augment that further with RAG,” said Woodbridge.