Understanding RAG, the Retrieval-Augmented Generation
Retrieval-Augmented Generation (RAG) is a method that compensates for the knowledge gaps of foundational models. Here are some key points to consider:
I. Using RAG with databases (DB)
1. Document database
RAG can be used to connect with document-based databases. Its process follows these steps:
Input prompt > Keyword search in DB > Embedding (the retrieved documents are converted into embedding vectors to be utilized by the model) > Inference.
2. Vector Database
- Process:
- Input prompt > Semantic search in vector DB > Inference.
- Advantage:
- Faster and more accurate search for common use cases.
The most common use case is for interactive AI, such as chatbots. However, adding an extra step increases response time, which can negatively impact user experience in interactive applications.
II. Limitations and recommendations for RAG
1. Model dependence
RAG makes you dependent on the model’s embedding. Think carefully before implementing a third-party API. For more information, check out the 10 reasons to create your proprietary AI.
2. Structured data
Contrary to some beliefs, you cannot feed just any type of data (articles, documents, videos, etc.) into an AI, data must be structured for effective similarity search.
The data must be structured! Otherwise, similarity searches won’t function properly because the data will not have been adequately contextualized.
3. Using embeddings
Reuse the base model’s embedding or an embedding API, such as OpenAI’s.
4. RAG limitations
RAG won’t work if the model hasn’t been trained on specific information stored in your knowledge base.
5. Graph databases
For real-time processing, graph databases are recommended. They facilitate indexing and quick semantic search using K-NN algorithms. For more information, see this paper.
Conclusion
Using RAG in open-source models offers significant advantages, but it’s crucial to understand its limitations and structure the data properly. To optimize your models, consider your options carefully and follow best practices for maximum efficiency.