Best Match 25 is a ranking function that measures query word frequency in relation to documentation length to provide a relevancy score. The result can help to retrieve the most relevant data and provide more accurate answers. For example, it is used in search engines, enterprise search, and answer-question tools to provide employees or clients with more accurate and useful answers.
The article will focus on how the Best Match 25 works, exploring a detailed explanation of its algorithm components and where it excels and where it does not. At the end of the article, you will also find an analysis from ConfidentialMind ML developer Ed, who will explain how we used BM25 to enhance our system.
What is BM25 (Best Match 25)?
BM25 (Best Match 25), also known as Okapi BM25, is a ranking algorithm used to get a relevancy score of documents for a given search term. Search engines, RAG tools, and various platforms use it to allow users to search and retrieve relevant information.
How Does BM25 Work?

BM25 calculates the relevancy score by comparing terms found in the search query with terms in the document. It mainly analyzing three things:
- What is the query?
- How many times the query is in the documents?
- How many total words each document has?
BM25 Algorithm

Components of the BM25 algorithm explained:
- Words in the query (t∈Q): This breaks down the query (user prompt) into words to understand the meaning behind it.
- Inverse document frequency IDF(t): It analyses how rare or commonly these words appear in the document to know which words should receive more prioritization. For example, “the” and “and” occur frequently, so they are not prioritized.
- Frequency of terms t in document D f(t,D): This part counts how often a term appears in the document to determine its relevancy.
- Total number of words document D (∣D∣): This determines the normalized length of the documents so that long documents are not prioritized based on having more terms. For example, a 500-word document can have the term 5 times, but a 2500-word document may have the term 25 times, so BM25 considers this and does not prioritize it.
- Average document length (avgdl): This determines the overall size of all documents to benchmark the document normalization lengths in the collection.
- A hyperparameter k1: Determines the frequency you want the term to affect the final bm25 score. (typically, k1 ∈ [1.2,2.0]).
- A hyperparameter b: Allows you to adjust how the length of documents influences the normalization. Whereas b=0 means no length normalization is applied, while b=1 means full normalization (typically, b ∈ [0.75]).
What Are Its Advantages and Disadvantages?
Benefits of BM25 Are:
- Dynamic ranking
- Length normalization
- Customization flexibility
- Applicability to many domains
Dynamic Ranking
Dynamic ranking means the bm25 algorithm does not have a fixed ranking system. Instead, it changes the ranking score each time with a new user request.
Length Normalization
It takes into consideration the lengths of the documents respectively to the query. So that long or short documents are not unfairly favored or penalized.
Customization Flexibility
BM25 allows you to change k1 and b parameters so you can use it to customize technologies built with it to match your specific use case.
Applicability to Many Domains
BM25 is so effective that it can be used in various domains, from e-commerce sites to media platforms to enterprise systems.
BM25 disadvantages
Here are two most common BM25 challenges:
- It lacks semantic understanding: it uses exact word matching. Because of that, it does not understand the real meaning behind words and sentences.
- It misses personalization: The BM25 relevance algorithm lacks the ability to consider user preferences - It treats everyone equally. You need to combine it with other systems that can perform personalization.
How is BM-25 Used Across Domains?
Below here are most of the common use cases of BM-25:
- Elastic search - is a distributed search and analytics engine that can store and search large data size quickly. BM25 is the default scoring algorithm in that process.
- Search engines – Search engines like Google, Yahoo, and Bing use BM25 to retrieve information from their databases and provide it to the users.
- Enterprise search tools - enterprises and large organizations integrate BM25 for internal toolings to allow employees or clients to find relevant information from various data sources, such as meeting notes, emails, ERP systems, and others.
- E-commerce sites - Websites and platforms that sell products online can use BM25 to provide users with more relevant results based on their past purchases, pages viewed, products in the card, or any other actions taken on their site.
- Recommendation systems - Travel agencies, media platforms (Netflix, Spotify, YouTube), and educational platforms use BM25 to recommend relevant choices.
- ConfidentialMind answer-question systems - At ConfidentialMind, we use BM25 to optimize our RAG to generate more accurate responses. Our system can power elastic search, recommendation systems, and enterprise search tools.
How Did We Use BM25 Score?
Naive semantic encoding of text chunks may lose exact term granularity during compression. For example, exact error codes are hard to retrieve, and naive embeddings may retrieve something about errors (but not the correct one). To mitigate this, we employ BM25, a modified TF-IDF function with a saturation term in our GraphRAG solution. We encode the chunks into semantic vectors and also index them through TF-IDF. Then at query time, we apply Reciprocal Rank Fusion (RRF) to combine the two retrievals and send them to the generator model.
We threw away retrieval precision and F1 and followed what other best practices papers look at: retrieval recall and answer metrics. Precision is arbitrary to hyperparameter k, so this is not so important for our study. According to Anthropic, the best amount of retrieval items to set is between 10 and 20.
Ed, an ML developer from ConfidentialMind, said, "We choose 10 because HotpotQA has a very low Q/D ratio of 2.0 (ratio of golden contexts to queries). We also increased the query size to 1000 because naive RAG is fast."
He further explained that in HotpotQA, BM25 is not expected to help as the dataset design is a large corpus with each question having exactly 2 documents (needle in a haystack). We therefore wanted to see performance not drop heavily in the hybrid BM 25 test.
The Result
As we predicted, there are no massive improvements, but please have a look at what happens when we combine BM25 with Re-ranker in our RAG pipeline.
Conclusion
BM25 is a ranking algorithm that can greatly increase the relevancy and accuracy of search and question-and-answering tools. It uses a ranking method that creates a relevancy score of documents based on user requests. So, it can retrieve information from databases and files. For example, it can search information about internal meetings, or match products with relevant products in the shopping card.
While Best Match 25 has many benefits, it is also worth understanding its challenge, which makes this tool unsuitable for all use cases. As a result, you need to combine it with other tools in a unique way to maximize its performance. For example, our team did so in our question-and-answer tool with the combination of other tools and features, making BM25 a powerful tool.