Large language models (LLMs) have revolutionized the entire informatics industry and beyond, breaking numerous paradigms in programming, web development, and tool creation, while also influencing the emergence of new products. However, these models are constrained by their training on fixed data limited to a specific timeframe, making updates costly. To address this, traditional methods like fine-tuning or reinforcement learning present their own challenges, the former is also costly and the latter still a research problem. Frameworks for LLMs orchetration offer a promising alternative to overcome these limitations.
This is the continuation of our last entry about LLMs orchestration.
Disclaimer:The assessment of these frameworks is based on their current developmental stage, as they are continuously evolving. The authors of this blog entry are not associated with any of the frameworks discussed.
LLM orchestration serves as an effective solution to the memory/knowledge constraints within chatbots based on pre-trained LLMs. Beyond merely resolving this issue, it allows for the integration of various data sources, such as PDFs, plain text, webpages, emails, and even Wikipedia articles, directly into the memory.
The fundamental concept behind LLM orchestration is to leverage the core abilities of the language model, primarily its language comprehension, to generate natural language "commands." The ultimate goal in creating a chatbot or query agent is to utilize proprietary data or collect new information, especially in scenarios involving scrapers.
This blog entry aims to compare and evaluate the performance of two key LLM orchestration platforms—LlamaIndex and GripTape—specifically in the context of constructing a chatbot using data extracted from multiple PDFs. The content of these PDFs is known, enabling a comprehensive assessment of response quality.
While tools like LlamaIndex, GripTape, and LangChain offer similar functionalities in constructing basic structures such as simple chatbots with additional knowledge, their underlying philosophies shape their distinct capabilities. LlamaIndex focuses on data ingestion and storage, while GripTape prioritizes agents, tools, and pipelines. The choice between these approaches significantly influences the ease of customization, scalability and complexity in developing tailored chatbots.
The prioritization of specific tools, often easily fine-tuned, can significantly impact the scalability of the desired solution. Consequently, the implementation of LLM orchestration is heavily influenced by the problem at hand. Given that these tools are developed within the same programming language, a potential strategy involves combining these platforms to leverage their collective strengths.
Before delving into the comparison, let's examine what we consider as the fundamental components of LLM orchestration tools. While LLMs serve as the primary resource, their constrained memory, known as the context window, necessitates mechanisms to introduce new knowledge into these frameworks. One common approach is through the utilization of vector databases, which store and semantically query text segments.
Another crucial aspect involves the use of "prompts," which serve as engineered queries directly to the LLM. Users fill certain sections of these prompts with information, enabling tailored responses. Additionally, these tools incorporate agents and auxiliary resources. Although the definitions may vary between frameworks, their core purpose is to augment the chatbot with supplementary resources beyond the LLM's capabilities. For instance, when an LLM struggles to provide an answer, such as performing complex numerical calculations, a Python module can be summoned. In such scenarios, the LLM decides which tool to employ, and this entire decision-making and computational process is often encapsulated within an "agent."
Formerly known as GPT Index, LlamaIndex focuses heavily on the ease of ingesting various data types, facilitated by the utilization of LlamaHub. LlamaHub is a community-maintained set of data connectors, ranging from database queries to direct connections with platforms like Slack, Telegram, Yelp, and more. Within LlamaIndex, all documents are abstracted under a class called "index," which encompasses various indexing operations, allowing seamless manipulation among these documents. This indexed data becomes readily accessible for querying purposes. While LlamaIndex possesses the capacity to integrate tools, it is not the platform's primary forte.
Probably one of the most notable features of LlamaIndex are hypothetical embeddings, being hallucinations attempting to answer the query, such that it can be used for semantically search the true answer.
Griptape offers a comprehensive framework for developing complete LLM applications, much like LangChain. It excels in creating highly tailored chatbots, providing a straightforward and intuitive experience for developers. Within Griptape, the process of constructing agents—essentially equipped chatbots—is simplified once the necessary tools have been defined. Notably, documents can be readily transformed into tools specifically crafted for particular queries, enhancing the platform's versatility.
Moreover, Griptape functions as a high-level utility where the clarity in developing customized tools doesn't compromise abstraction. This balance ensures that design and scalability remain robust aspects of the platform.
One obstacle common to both Griptape and LlamaIndex is that their documentation primarily consists of examples rather than detailed information about the modules. Understanding how agents are constructed within LLamaIndex is particularly challenging. Griptape, on the other hand, faces issues where some examples don't function straight out of the box. Getting past these challenges is crucial for creating a customized chatbot that uses agents to represent individual PDF files.
Our experience with Griptape was notably smooth. Defining agents and launching a fully customized chatbot required only a few lines of code. Customization also demanded less fine-tuning compared to LlamaIndex. As previously mentioned, Griptape prioritizes tooling, which is evident in its user-friendly interface. Conversely, LlamaIndex, despite its focus on indexing and data organization, lacks sufficient out-of-the-box guidance in its documentation. Consequently, constructing agents based on the index proved to be a more intricate process and notably underperformed in both accuracy and execution time compared to Griptape.
Both LlamaIndex and Griptape offer direct mechanisms for ingesting PDFs. Various chatbot schemes and fine-tunings have been tested. It was observed that both frameworks exhibit similar performance when a single database encompasses all PDFs. However, when constructing agents for each PDF, LlamaIndex tends to struggle unless the query is extremely precise. In contrast, Griptape adeptly manages the usage of agents, demonstrating faster and more accurate responses. Notably, Griptape displayed greater resilience against unconventional queries and never experienced crashes.
Our assessment leads us to consider Griptape as a more comprehensive and relatively well-documented framework. LlamaIndex, on the other hand, primarily excels as a robust tool for indexing, storage, and queries. It can be effectively combined with LangChain and Griptape to string together queries in a more intricate workflow.