🚀
10 Days Realtime LLM Bootcamp
  • Introduction
    • Getting Started
    • Course Syllabus
    • Course Structure
    • Prerequisites
    • Greetings from your Instructors
    • First Exercise (Ungraded)
  • Basics of LLM
    • What is Generative AI?
    • What is a Large Language Model?
    • Advantages and Applications of Large Language Models
    • Bonus Resource: Multimodal LLMs
  • Word Vectors Simplified
    • What is a Word Vector
    • Word Vector Relationships
    • Role of Context in LLMs
    • Transforming Vectors into LLM Responses
      • Neural Networks and Transformers (Bonus Module)
      • Attention and Transformers (Bonus Module)
      • Multi-Head Attention and Further Reads (Bonus Module)
    • Let's Track Our Progress
  • Prompt Engineering
    • What is Prompt Engineering
    • Prompt Engineering and In-context Learning
    • Best Practices to Follow in Prompt Engineering
    • Token Limits in Prompts
    • Prompt Engineering Excercise
      • Story for the Excercise: The eSports Enigma
      • Tasks in the Excercise
  • Retrieval Augmented Generation and LLM Architecture
    • What is Retrieval Augmented Generation (RAG)?
    • Primer to RAG Functioning and LLM Architecture: Pre-trained and Fine-tuned LLMs
    • In-Context Learning
    • High level LLM Architecture Components for In-context Learning
    • Diving Deeper: LLM Architecture Components
    • LLM Architecture Diagram and Various Steps
    • RAG versus Fine-Tuning and Prompt Engineering
    • Versatility and Efficiency in Retrieval-Augmented Generation (RAG)
    • Key Benefits of RAG for Enterprise-Grade LLM Applications
    • Similarity Search in Vectors (Bonus Module)
    • Using kNN and LSH to Enhance Similarity Search in Vector Embeddings (Bonus Module)
    • Track your Progress
  • Hands-on Development
    • Prerequisites
    • Dropbox Retrieval App in 15 Minutes
      • Building the app without Dockerization
      • Understanding Docker
      • Using Docker to Build the App
    • Amazon Discounts App
      • How the Project Works
      • Step-by-Step Process
    • How to Run the Examples
  • Live Interactions with Jan Chorowski and Adrian Kosowski | Bonus Resource
  • Final Project + Giveaways
    • Prizes and Giveaways
    • Tracks for Submission
    • Final Submission
Powered by GitBook
On this page
  • 1. Which data types can RAG handle?
  • 2. Are Vector Databases necessary for RAG?
  • 3. Is a Separate Real-Time Processing Framework Needed for a Real-time Stream of Data?
  • 4. Is the ChatGPT Plugin for Bing Search an example of RAG?
  1. Retrieval Augmented Generation and LLM Architecture

Versatility and Efficiency in Retrieval-Augmented Generation (RAG)

PreviousRAG versus Fine-Tuning and Prompt EngineeringNextKey Benefits of RAG for Enterprise-Grade LLM Applications

Last updated 1 year ago

To further equip you with a comprehensive understanding of RAG and its adaptability, let's address some frequently asked questions (FAQs) and challenges developers often encounter. A couple of them were asked to us during the course of this bootcamp.

1. Which data types can RAG handle?

One of the most compelling features of RAG is its flexibility to work with an array of data types. Whether relational databases, APIs, transcribed audio, or even live feeds from the internet, RAG can seamlessly integrate these into its retrieval mechanism. This adaptability enhances the model's ability to generate contextually accurate and informative responses.

  • Data Types Supported: Relational databases, free-form text, PDFs, APIs, transcribed audio, and streaming platforms like Kafka, Debezium, Redpanda, etc.

2. Are Vector Databases necessary for RAG?

Short answer – no. Andrej Karpathy humorously addressed this question in a , based on his personal project. With we've outlined ahead, it's possible to efficiently manage vector indexes for production use cases, within program memory. This approach provides persistent storage but also offers essential functionalities for scaling and maintaining such systems in a production environment.

If you don't know, a Vector Database is a specialized database that handles vector embeddings. These databases are primarily used to efficiently store, search, and retrieve vectors for their use cases in LLMs and Recommender Systems.

Initially, using vector embeddings led developers to naturally consider the application of dedicated vector databases, frequently seen in Recommender Systems. However, this understanding is quickly evolving.

  • Introducing any new database into an enterprise environment comes with its own complexities and challenges, making a simpler solution preferable in many instances.

  • In the hands-on implementation module ahead, we implement a production-grade application without using any Vector DB. While the frameworks we use are compatible with notable vector DBs, our example shows that vector DBs are not mandatory components, especially when using them is a challenge.

So, while the allure of vector databases exists, it's critical to understand that they are not the only path to efficient RAG implementation.

3. Is a Separate Real-Time Processing Framework Needed for a Real-time Stream of Data?

4. Is the ChatGPT Plugin for Bing Search an example of RAG?

Some may wonder whether certain web search plugins in LLM applications like ChatGPT utilize a Retrieval-Augmented Generation (RAG) approach. For example, a ChatGPT interface with a web search plugin can pull current information online, providing a more accurate and up-to-date response.

  • Example: This plugin lets the model answer questions about things that happened after it was last trained, by retrieving information from a blog on the internet.

In a way, these can be considered LLM applications that employ a retrieval-augmented generation strategy. However, they do not retrieve data from a diverse set of data sources, such as a collection of PDFs, links, or Kafka topics. This affects the quality of their responses. In such scenarios, LLMs like ChatGPT can only offer the best answer they find from the specific webpage that the plugin accessed.

On the other hand, when you incorporate RAG into your custom LLM application, you benefit from efficient vector embeddings and vector search capabilities. These allow you to extract much more relevant information from a comprehensive data corpus, aiding in identifying the most pertinent answers.

By understanding these facets, you're better equipped to leverage the strengths of RAG in your LLM architecture, whether for an enterprise solution or a personal project.

Just like the emergence of foundational Large Language Models (LLMs) or what some call the "AI Wave," Retrieval Augmented Generation (RAG) is also a relatively new concept. It gained prominence after the 2020 publication of "" by a team at Facebook AI Research.

Tools like can generate and manage their real-time vector indexes, negating the need for a separate vector database. Additionally, conventional databases like PostgreSQL are expanding their features to include built-in support for vector indexing, thanks to extensions like

This depends on your choice of RAG framework. Referring to the framework we've used ahead in this curriculum, it natively uses Pathway, an ultra-performant data processing engine () suitable for batch and streaming use cases.

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
LLM App
PG Vector.
2023 benchmarks
Tweet
the framework
Tweet Link