Understanding RAG, the Retrieval-Augmented Generation

📅 Published on July 30, 2024

Retrieval-Augmented Generation (RAG) is a method that compensates for the knowledge gaps of foundational models. Here are some key points to consider:

I. Using RAG with databases (DB)

1. Document database

RAG can be used to connect with document-based databases. Its process follows these steps:

Input prompt > Keyword search in DB > Embedding (the retrieved documents are converted into embedding vectors to be utilized by the model) > Inference.

2. Vector Database

Process:
- Input prompt > Semantic search in vector DB > Inference.
Advantage:
- Faster and more accurate search for common use cases.

The most common use case is for interactive AI, such as chatbots. However, adding an extra step increases response time, which can negatively impact user experience in interactive applications.

II. Limitations and recommendations for RAG

1. Model dependence

RAG makes you dependent on the model’s embedding. Think carefully before implementing a third-party API. For more information, check out the 10 reasons to create your proprietary AI.

2. Structured data

Contrary to some beliefs, you cannot feed just any type of data (articles, documents, videos, etc.) into an AI, data must be structured for effective similarity search.

The data must be structured! Otherwise, similarity searches won’t function properly because the data will not have been adequately contextualized.

3. Using embeddings

Reuse the base model’s embedding or an embedding API, such as OpenAI’s.

4. RAG limitations

RAG won’t work if the model hasn’t been trained on specific information stored in your knowledge base.

5. Graph databases

For real-time processing, graph databases are recommended. They facilitate indexing and quick semantic search using K-NN algorithms. For more information, see this paper.

Conclusion

Using RAG in open-source models offers significant advantages, but it’s crucial to understand its limitations and structure the data properly. To optimize your models, consider your options carefully and follow best practices for maximum efficiency.