RAG But Better: Reranking with Cohere AI

February 08, 2025

Retrieval-Augmented Generation (RAG) has become a buzzword in the world of AI, promising a lot but often leaving users puzzled when the results don't match expectations. While it's straightforward to kickstart a RAG pipeline, the complexity lies in fine-tuning it for optimal performance. In this blog, we’ll explore the role of rerankers in enhancing RAG pipelines and how they can significantly improve retrieval accuracy.

Understanding RAG and Rerankers

At its core, RAG involves combining document retrieval with language model generation. This process typically includes a vector database where documents are stored as vectors. When a query is made, the system retrieves the closest vectors to generate a response. However, this method often falls short when it comes to maximizing the relevance of the returned documents.

Rerankers come to the rescue by providing an additional layer of refinement. After the initial retrieval of documents, a reranker evaluates and reorders these documents based on their relevance to the query. This ensures that the most pertinent information is prioritized before being fed into the language model (LM).

Why Rerankers Are Necessary

The primary issue with relying solely on retrieval models is that they may not always return the most relevant documents. For instance, while the top results might be decent, valuable information might be buried deeper in the results list. If we simply increase the number of documents returned, we risk overwhelming the LM, which has limited context windows. This is where rerankers shine, allowing us to filter through a larger set of documents while ensuring only the most relevant ones reach the LM.

The Mechanics of Reranking

Rerankers function by taking a set of documents and a query, processing them through a more complex model to generate a similarity score. This score indicates how well each document addresses the query, enabling the system to reorder the documents effectively. Unlike simpler retrieval models that compress information into vectors, rerankers maintain a higher level of detail, resulting in more accurate similarity assessments.

Implementing Reranking in Python

To illustrate the implementation of reranking, let’s dive into a Python example using the Cohere AI reranking model alongside the OpenAI text-embedding-ada-002 model and the Pinecone Vector Database. This setup allows us to create a robust retrieval pipeline that enhances the accuracy of our results.

Setting Up Your Environment

Before we begin, ensure you have the necessary libraries installed:

!pip install cohere pinecone-client openai

Initializing the Models

We will start by importing the required libraries and setting up our API keys for OpenAI and Cohere:

import openai
import cohere
import pinecone

openai.api_key = 'YOUR_OPENAI_API_KEY'
cohere_client = cohere.Client('YOUR_COHERE_API_KEY')
pinecone.init(api_key='YOUR_PINECONE_API_KEY', environment='YOUR_ENVIRONMENT')

Creating the Index

Next, we need to create a vector index where our embeddings will be stored:

index_name = "rerankers"
pinecone.create_index(index_name, dimension=1536, metric='cosine')

Testing Retrieval Without Reranking

Before we introduce reranking, let’s see how our retrieval system performs without it. We’ll define a function to query our index and retrieve the top documents:

def query_index(query, top_k=3):
    vector = openai.Embedding.create(input=query, model='text-embedding-ada-002')['data'][0]['embedding']
    return pinecone.Index(index_name).query(vector, top_k=top_k)

Now, let’s test this function:

results = query_index("What are the benefits of reinforcement learning with human feedback?")
print(results)

Introducing Reranking

Now that we have our baseline results, let’s incorporate the Cohere reranking model. We’ll first fetch a larger set of documents and then apply the reranker:

def rerank_results(query, documents):
    reranked = cohere_client.rerank(
        model='rerank-english-v2',
        query=query,
        documents=[doc['text'] for doc in documents]
    )
    return reranked['documents']

With this function, we can rerank the documents we retrieved earlier:

documents = query_index("What are the benefits of reinforcement learning with human feedback?", top_k=25)
reranked_documents = rerank_results("What are the benefits of reinforcement learning with human feedback?", documents)
print(reranked_documents)

Evaluating the Impact of Reranking

To understand the effectiveness of reranking, let's compare the results before and after reranking. We want to see if the reranked documents provide more relevant information:

for doc in reranked_documents:
    print(f"Document: {doc['text']}")

Best Practices for Using Rerankers

While rerankers can vastly improve retrieval performance, there are some best practices to keep in mind:

Use High-Quality Models: Ensure that the models used for both retrieval and reranking are state-of-the-art to avoid degrading performance.
Limit Context Size: Be mindful of the context window limits of your LLM. Only pass the top results that are most relevant.
Experiment: Different queries may yield different results. Experiment with various models and settings to find what works best for your use case.

Conclusion

Rerankers are an essential tool in optimizing RAG pipelines. They enhance the quality of retrieved documents, ensuring that only the most relevant information reaches the language model. By implementing reranking, developers can significantly improve the accuracy and usefulness of AI-generated responses. As we continue to explore the capabilities of AI, understanding and leveraging tools like rerankers will be crucial for achieving superior results in retrieval-based applications.

For more detailed examples and code snippets, be sure to check out the full documentation and explore the capabilities of Cohere AI and OpenAI models.

Bagheshri