It’s not uncommon to be skeptical or curious about new technology. In the legal industry, that caution and curiosity are...
Read MoreHome / Webinars / TCDI Talks / What is a Specialized LLM? | TCDI Talks: Episode 3
This week on TCDI Talks, our experts from TCDI and Altumatim tackle the question, “what is a specialized Large Language Model (LLM)?” During the discussion, Caragh Landry, David York, David Gaskey, and Vasu Mahavishnu take an in-depth look at these systems, discussing the difference between a custom and general-use LLM, whether you need one in eDiscovery, and what it means to be LLM agnostic.
0:17 – Caragh Landry:
Welcome to TCDI Talks, our third episode. Today we are with Altumatim and TCDI talking about what is a specialized LLM, and should we be using one?
Dave, I’m going to ask you, Dave York, why is this even a question? Why are we talking about it today?
0:40 – David York:
Well, you know, I think it’s an interesting question, because I don’t know that many clients and people that we speak to know that that’s even an option. I know when we started looking at LLMs and talking about AI, you know, I know we didn’t have an understanding of the benefits and what was available to us. And I think a lot of clients are kind of in that same boat.
So, knowing that you’re not just locked into ChatGPT or the same model over and over again. That there are options out there, I think is an important thing to start with when talking with clients or talking about projects and LLM related projects.
1:28 – Caragh Landry:
Yeah, so I get this question a lot from our clients that we’re talking to, are we using a specialized LLM? And when I first got asked the question, I was a little thrown, because I didn’t know that there were specialized LLMs or what the difference was between the two.
So, we’ve since investigated that. We’ve learned that there are. But David and Vasu, I’ll turn it to you guys. What is a specialized LLM? What is the difference between a general purpose LLM and a specialized LLM?
2:00 – Vasu Mahavishnu:
So, most of the base models that’s available, like your GPT-4, Gemini, we call more general Large Language Models that will generally do multitask. And a specialized Large Language Model will be something that we will consider fine-tuned model that can do a specialized task.
But there’s a third option, which is a custom LLM, which is built from scratch or pre-trained from scratch. And that would be something where an organization can use their own proprietary data and train the model up, and we’ll get more narrow and more specialized in the sense that you’d be able to answer the question about the idiom set. So that’s more custom LLM.
2:57 – Caragh Landry:
And Vasu, is that…that’s adding their own data to an already existing LLM? Or is it just using their own data?
3:08 – Vasu Mahavishnu:
No, if you are building an LLM from scratch, then you’re gonna provide all the data. So, for example, the GPT-3 was about 1.5, I could be wrong. It’s probably larger than 1.5 trillion tokens or parameters that was added to train that up. You’re looking at almost a third of the internet data that was used in some of these Large Language Models to train.
So, if someone wants to go and build it, you’re looking at curating a large, large amount of data. A lot of that has already been done by these proprietary Large Language Models. And so, retraining that, adding that data and your own proprietary data is quite a daunting task I would say.
You just can’t take your data alone and train it, because it would miss a lot of the other data points that come, you know from training, you know, stuff from the internet, right? So, you would need quite a bit.
4:23 – David Gaskey:
Yeah, and that leaves me a bit skeptical about people who are saying they have a truly custom LLM. Did they really have the time and the resources to gather enough data to create a model that has the capabilities to do things that we’re expecting LLMs to be doing in the legal field?
If you just, like if a large law firm said we’re gonna build our own LLM. It’s gonna be based on all the data that, you know, every brief we’ve written, every contract we’ve ever negotiated, all of that. You will have a resulting LLM that can probably, very narrowly, very specifically identify things within those types of documents.
But it’s going to be missing out on so much information or data patterns that it could follow to go outside of that. To infer in the sense of the way we say LLMs have inference – it’s not the same as human inference – but recognizing data patterns with less data, you have less capability.
So, I’m a bit skeptical when people say they’ve purely made a custom one. My guess is when people are talking custom LLMs, they’re really talking about they took an existing one and they fine-tuned it. They somehow added to the capabilities or did some additional training to get it to focus on particular tasks. Let’s say you wanted only to do contract term analysis. That’s the way I think they would probably approach it.
6:04 – Caragh Landry:
David, when we’re talking to our clients, what we think is one of the best benefits of working with you, is that you guys are LLM agnostic. That you’ll use the right LLM for the matter. Or getting to know the data, that there could be better LLMs than others in certain scenarios.
What does that mean? Like why, how, what have you seen?
6:26 – David Gaskey:
So, yeah, I mean, there are different LLMs, they’re just a little bit better at some things than others. And some of that is by trial and error.
You know, you have to take an LLM and subject, you test it. Subject it to repeated tests. Is this model useful for privilege detection, for example? We have found that some are better than others. So, when we do a privilege review, we use the ones that we determined work the best.
A lot of it really comes down to the system you’ve engineered around the LLM. If you wanted to have the LLM be the solution, then I could understand people saying I want a specialized LLM. But if an LLM is just part of the solution, it’s an ingredient in the recipe, or it’s a tool in your toolbox, then having an LLM that is trained on a broader scale with more parameters, to me, gives you the best result because you have more power.
And you have to build a system around it, though, to accomplish what you really want to get.
7:36 – Caragh Landry:
Right. Yeah, that makes sense. So, maybe a specialized LLM isn’t what we need, but more thought or consideration about which LLM is used and which scenario is the better approach.
7:50 – David Gaskey:
Yes.
7:51 – Vasu Mahavishnu:
And just to build on what David has mentioned about the different LLMs. Even if you have two LLMs trained on the exact data set and yield different performances. Because that’s a step of fine tuning that is done when they take unsupervised data and make it supervised, and then it goes through prompt tuning. And then it goes through fine tuning. That is what the data scientists would come in and realign or do alignment on the Large Language Model to be able to perform some of these general tasks.
So, different LLMs undergo different types of prompt tuning, and it comes down to how it was set up. You know, and therefore you can get different benchmarks and quality in terms of responses.
And, so, picking the right LLM for each task is important. There is no…GPT-4 might be good for a specific task, but it may fail miserably for another. And that’s what we would do, is determine that for the customer.
9:03 – David York:
David and Vasu, is that one of the reasons that you made the decision to be effectively LLM agnostic, as Caragh mentioned? So that you can pivot and have those options as needed based on the client needs?
9:18 – David Gaskey:
Yeah, and on top of that, we’re always thinking about the future. How far can we take this technology? What else can we accomplish?
If you dedicate a significant amount of resources to building a customized LLM that’s narrowly focused, six months later, that may be obsolete. I mean, this technology is moving so quickly that I think you’re taking the risk if you’re going in that direction.
The other thing is introducing additional capabilities, like the multimodal LLMs, they allow us to now input video and get a summary of that video. You can’t do that with older LLMs. So, there’s a balancing act, but our vision is to take this technology as far as we can. Give clients the best possible results. And being LLM agnostic allows us to do that.
10:16 – Vasu Mahavishnu:
And the other thing we do is we don’t rely on one LLM to produce a response. We do a sample set of multiple in a specific task. So, you’re leveraging the different capabilities of different LLMs for performing that same task. So, that’s what we do. And so, we have the capability of interchanging these Large Language Models dynamically in runtime.
So, that’s why we are able to get extraordinary results. And primarily the reason why going down the path of a completely fine-tuned or a custom LLM does not make sense.
11:07 – Caragh Landry:
All right. Thank you guys for talking through this. Really appreciate it. And we’re looking forward to our next TCDI Talks.
With over 25 years of experience in the legal services field, Caragh Landry serves as the Chief Legal Process Officer at TCDI. She is an expert in workflow design and continuous improvement programs, focusing on integrating technology and engineering processes for legal operations. Caragh is a frequent industry speaker and thought leader, frequently presenting on Technology Assisted Review (TAR), Gen AI, data privacy, and innovative lean process workflows.
In her role at TCDI, Caragh oversees workflow creation, service delivery, and development strategy for the managed document review team and other service offerings. She brings extensive expertise in building new platforms, implementing emerging technologies to enhance efficiency, and designing processes with an innovative, hands-on approach.
David York oversees TCDI’s Litigation Services team involved in projects and data relating to eDiscovery, litigation management, incident response, investigations and special data projects. Since his start in the industry in 1998, Dave has made the rounds working on the law firm, client, and now provider side of the industry, successfully supporting, executing and managing all phases of diverse legal and technical projects and solutions.
During his career he has been a NC State Bar Certified Paralegal, holds a certification in Records Management, is a Certified eDiscovery Specialist (ACEDS), and has completed Black Belt Lean Six Sigma training.
David has been at the interface between law and technology for more than three decades. Specializing in intellectual property law, he has represented clients from all over the United States, Europe and Asia, including Fortune 50 companies, whose businesses involve a broad spectrum of technologies.
David has extensive experience litigating patent disputes at the trial and appellate court levels including the Arthrex v. Smith & Nephew case that received an “Impact Case of the Year” award in 2020 from IP Management. His litigation experience was a primary influence on how Altumatim naturally fits into the process of developing a case and why the platform is uniquely designed to help you win by finding the most important evidence to tell a compelling story.
Vasu brings his natural curiosity and passion for using technology to improve access to justice and our quality of life to the Altumatim team as he architects and builds out the future of discovery. Vasu blends computer science and data science expertise from computational genomics with published work ranging from gene mapping to developing probabilistic models for protein interactions in humans.
As a result, he understands the importance of quality data modeling. His extensive experience with business modeling, code construction for front-end and back-end systems, and graphic presentation influenced the architecture of Altumatim. His creativity and commitment to excellence shine through the user experience that Altumatim’s customers enjoy.
It’s not uncommon to be skeptical or curious about new technology. In the legal industry, that caution and curiosity are...
Read MoreWhen we’re talking about artificial intelligence, it’s easy to get swept up in the allure of the latest advancements. Generative...
Read MoreI wrote a blog back in 2018 called, “Let’s Stop Calling AI, AI” because it gave me an ick. People...
Read MoreDavid Gaskey has been at the interface between law and technology for more than three decades. Prior to focusing on...
Read More