Large Language Models Comparison: Unique’s Experience

In recent years, there has been a rapid development of LLMs. New models are constantly being released, each with its own strengths and weaknesses. This can make it difficult for developers and businesses to choose the right model for their application.

From Unique’s experience, we know that finding the right solution for a specific industry like Banking or Insurance can be quite challenging. But based on what we've learnt, we can already see some of the advantages of certain LLMs and approaches that we are going to explain further in this blog post.

Large language models (LLMs) are a type of artificial intelligence that are trained on massive datasets of text and code. They can be used for a variety of tasks, such as generating text, translating languages, and writing different kinds of creative content.

This blog post will compare some of the most popular LLMs available today. We will discuss their strengths and weaknesses, and we will provide some recommendations for how to make up your mind based on our experience.

A List of the Most Capable Language Models

As mentioned before, the development of AI is an ongoing process. There are a lot of LMs that are currently under development and will be released soon. However, it’s easy to distinguish the front-runners of this category already:

Llama is an open-source large language model developed by Meta (Facebook). It is trained on a massive amount of text and code, and can be used for a variety of tasks, such as generating text, translating languages, and writing different kinds of creative content. Llama is known for its high performance and its ability to generate human-quality text.

Falcon is a large language model developed by the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) in Abu Dhabi. It is optimized for performance and efficiency, and is trained on a dataset of high-quality text. Falcon has been shown to outperform GPT-3 on a variety of benchmarks, and is a promising new language model.

Bard is a large language model developed by Google AI. It is based on the PaLM 2 model, which is one of the largest and most powerful language models ever created. Bard is still under development, but it has already been shown to be capable of impressive feats, such as writing different kinds of creative content, translating languages, and answering your questions in an informative way.

h2oGPT is a platform developed by H2O.ai that provides access to a variety of large language models, including GPT-3.5 turbo, LLaMA 2, and Falcon. h2oGPT can be used to generate text, translate languages, and write different kinds of creative content. It is a powerful tool for developers and researchers who want to use large language models in their applications.

Claude-2 is a large language model developed by Anthropic. It is trained on a massive dataset of text and code, and is able to generate text, translate languages, and write different kinds of creative content. Claude-2 is known for its ability to generate text that is both informative and engaging.

Dolly 2.0 is an open-source, instruction-following LLM developed by Databricks. It is a 12B parameter language model based on the EleutherAI pythia model family. Dolly 2.0 is fine-tuned on a high-quality human-generated instruction following dataset. The model is designed to be commercially viable, allowing organizations to create and customize their LLMs without relying on third-party APIs.

GPT-3.5 is a large language model developed by OpenAI. It is a successor to the popular GPT-3 model, and is trained on a massive dataset of text and code. GPT-3.5 has been shown to outperform GPT-3 on a variety of benchmarks, and is a powerful tool for developers and researchers who want to use large language models in their applications.

GPT-4 is the latest version of the GPT language model developed by OpenAI. It is trained on a massive dataset of text and code, and has a significantly larger number of parameters than previous versions of GPT. GPT-4 has been shown to outperform previous versions of GPT on a variety of benchmarks, and is a powerful tool for developers and researchers who want to use large language models in their applications.

Vicuna 13B is an open-source large language model developed by the Stanford Artificial Intelligence Laboratory (SAIL). It is trained on a massive dataset of text and code, and has 13 billion parameters. Vicuna 13B is known for its ability to generate text that is both informative and engaging. It has also been shown to be effective at a variety of tasks, such as translation, summarization, and question answering.

Alpaca 13B is another open-source large language model developed by SAIL. It is also trained on a massive dataset of text and code, and has 13 billion parameters. Alpaca 13B is known for its speed and efficiency. It can generate text at a much faster rate than Vicuna 13B, and it requires less computing power to run.

All of the models mentioned above, except for GPT-3.5 and GPT-4, are open source, meaning that they can be freely used and modified by anyone. This makes them a valuable resource for developers and researchers who want to use large language models in their applications.

Large Language Models’ Performance Comparison

In terms of performance, the models vary depending on the task at hand. For example, GPT-3.5 and GPT-4 are generally better at generating text than Llama or Falcon. However, Llama and Falcon are more efficient and require less computing power to run. Ultimately, the best model for a particular task will depend on the specific requirements of the application.

To illustrate the LLMs capabilities, take a look at this FLASK evaluation framework (Fine-grained Language Model Evaluation based on Alignment SKill Sets) that mentioned a few of the abovementioned language models.

Based on this chart, you can clearly see that, at the moment, GPT-4 outperforms all other language models in all categories except for Harmlessness, where GPT-3.5 and Claude lead the way.

In terms of performance, GPT-4 is generally the best performer across all tasks. However, it is also the largest and most computationally expensive model. Bard is also a very capable model, but it is still under development. Llama and Falcon are both smaller and more efficient models than GPT-4 and Bard, but they may not be as accurate or versatile.

Ultimately, the best model for a particular task will depend on the specific requirements of the application. If you need the best possible performance, then GPT-4 is the best choice. If you need a more cost-efficient but still very capable model, then Bard, Llama, or Falcon may be a better option.

AI Solutions At Unique

At Unique, we currently use a multitude of AI solutions. Among them you can find GPT-4, GPT-3.5, GPT-3.5 Turbo, GPT-3.5 Turbo 16K, Whisper, Microsoft transcription service, and BERT, as well as LlaMA 2.0 and Falcon.

These models serve different purposes: analyze the content uploaded to our platform, extract insights, create transcriptions for call recordings, extract topics, generate custom content based on request, etc.

As we have already gained some valuable experience with our clients like Pictet and LGT, we have a vast knowledge of enterprise-specific issues when it comes to implementing an AI solution.

RAG (Retrieval-Augmented Generation)

As a general approach, we advise our clients against training their own models. Feeding sensitive company information to further develop AI’s capabilities can be risky. Instead, we offer RAG (Retrieval-Augmented Generation), the concept, which involves using pre-trained models like GPT-3.5 or Llama to generate text based on prompts and existing sources uploaded to the platform. However, it's important to emphasize the need for configuring RAG specifically for each industry and use case.

Privacy and Access Control

Another important issue is maintaining document privacy and access control. It’s imperative to have an enterprise-ready retrieval model that can synchronize with existing access levels. This can help avoid security and compliance issues in the future, as only authorized users can access sensitive information uploaded to the platform.

Model Agnostic Approach

Our Unique FinanceGPT system is model agnostic, meaning that we can use different models like GPT-3, GPT-4, or Llama depending on the use case and desired results. This means that our clients don’t have to choose a specific language model that will be responsible for generating a response to their request. Based on prompts and configurations, Unique platform chooses the best solution to provide the maximum efficiency depending on the use case.

Compliance Layer

At Unique, we employ a compliance layer that combines several elements of IT security, data protection and legal frameworks. We aim to make it easy for the user to follow GDPR-principles by employing built-in privacy by design and privacy by default mechanism in all our processes (e.g. automatic watermarking of AI-generated content). We control all information that is sent and received by Large Language Models (like GPT models offered by Microsoft) and make sure to filter out all personal identifiable data by means of pseudonymization. In addition, we have several opt-outs in place to make sure no data is stored by OpenAI and the prompts are not checked for harmful content. Therefore, our users are strongly recommended to follow responsible prompting guidelines.

Final Thoughts: Navigating the LLM Landscape

The rapid evolution of Large Language Models (LLMs) has ushered in a new era of AI capabilities, with each model boasting its own set of strengths and potential applications. Unique's journey through this landscape has been marked by hands-on experience with various models, from GPT-4's unparalleled text generation prowess to the efficiency of Llama and Falcon. Our diverse AI toolkit, which includes models like GPT-4, GPT-3.5, and LlaMA 2.0, enables us to cater to a wide range of industry-specific needs, ensuring optimal results for our clients.

As the LLM domain continues to expand, Unique remains committed to exploring new models, refining our techniques, and delivering top-notch AI solutions to our clients.

Written by

Hanna Karbowski

2023.08.24

Drop your Mail

And get your free Unique Plugin

in the App Store

E-Mail