Hybrid Search Strategies: Combining BM25 and Vector Embeddings
Hybrid Search: Merging Keyword Matching (BM25) and Semantic Vectors
In the modern landscape of information retrieval, developers often find themselves choosing between the precision of traditional keyword matching and the nuance of AI-driven semantic understanding. However, the most robust production systems don't choose one over the other; they implement a hybrid search bm25 vector architecture. By combining the statistical rigor of BM25 with the contextual depth of vector embeddings, engineers can build search experiences that handle both specific technical queries and broad conceptual questions with equal proficiency. This article explores how to architect these systems, ensuring your application delivers the right results every time.
Why Semantic Search Fails on Exact Part Codes, Names, and Skus
While semantic search has revolutionized how we interact with unstructured data, it is not a silver bullet. When you build a semantic search embeddings app, you are essentially mapping text into a high-dimensional vector space. In this space, "Apple" (the fruit) and "Apple" (the company) might be separated by distance, but "iPhone 15 Pro" and "Smartphone" might be clustered closely together.
The failure modes of pure vector search become glaringly obvious in enterprise environments:
- The Exact Match Problem: If a user searches for a specific SKU like
XJ-900-B, a vector model might return a "similar" product likeXJ-900-Abecause they are semantically close. However, the user specifically needs theBvariant. Vector models often struggle with character-level precision. - Out-of-Vocabulary (OOV) Terms: Specialized jargon, internal project codenames, or rare medical terminology may not have been present in the training data of your embedding model (e.g., OpenAI’s
text-embedding-3-small). - The "Vague Query" Trap: Semantic search excels at "Why is my server overheating?", but it often fails at "Find document 402-A."
To solve this, we must integrate keyword plus semantic search methodologies. By treating the search problem as a multi-modal retrieval task, we ensure that the system respects the user's intent for exact matches while leveraging AI for conceptual relevance.
Implementing BM25 Text Search in Postgres/Elasticsearch
BM25 (Best Matching 25) remains the gold standard for keyword retrieval. It improves upon TF-IDF by accounting for term frequency saturation and document length normalization.
Implementing in PostgreSQL
PostgreSQL provides robust full-text search capabilities using tsvector and tsquery. To implement BM25-like ranking, we use the ts_rank function.
-- Create a GIN index for performance
CREATE INDEX idx_fts_content ON products USING GIN (to_tsvector('english', product_description));
-- Querying with BM25-style ranking
SELECT id, name, ts_rank(to_tsvector('english', product_description), plainto_tsquery('english', 'high performance laptop')) AS rank
FROM products
WHERE to_tsvector('english', product_description) @@ plainto_tsquery('english', 'high performance laptop')
ORDER BY rank DESC;Implementing in Elasticsearch
When using an elasticsearch hybrid query, you can leverage the match query for BM25 and the knn search for vectors. Elasticsearch 8.x+ makes this seamless with the hybrid search capability.
GET /products/_search
{
"query": {
"hybrid": {
"queries": [
{ "match": { "description": "high performance laptop" } },
{ "knn": { "field": "vector_field", "query_vector": [0.1, 0.2, ...], "k": 10, "num_candidates": 100 } }
]
}
}
}Combining Results: Reciprocal Rank Fusion (RRF) Demystified
Once you have two separate lists of results—one from BM25 and one from your vector database—you face the "fusion" problem. How do you combine a list ranked by keyword relevance with a list ranked by cosine similarity?
The industry standard is Reciprocal Rank Fusion (RRF). RRF is an algorithm that combines multiple result sets without needing to normalize the underlying scores (which are often on different scales).
For each document d, the RRF score is calculated as:
RRF(d) = Sum_for_r_in_R ( 1 / (k + rank(d, r)) )
Where:
- $R$ is the set of result lists.
- $rank(d, r)$ is the rank of document $d$ in list $r$.
- $k$ is a constant (usually 60) that mitigates the impact of high-ranking items by outliers.
Why RRF Works
RRF is effective because it rewards documents that appear in the top results of both search methods. If a document is ranked #1 in BM25 and #2 in Vector search, it will naturally float to the top of the combined list, effectively performing a hybrid search bm25 vector merge that is highly resilient to noise.
Writing SQL Queries that Merge BM25 and pgvector Ratings
If you are using PostgreSQL with the pgvector extension, you can implement RRF directly in your application layer or via a stored procedure. Here is a simplified approach to merging these scores:
WITH bm25_results AS (
SELECT id, ts_rank(fts_col, query) as score,
ROW_NUMBER() OVER (ORDER BY ts_rank(fts_col, query) DESC) as rank
FROM products, plainto_tsquery('search term') query
WHERE fts_col @@ query
),
vector_results AS (
SELECT id, 1 - (embedding <=> '[0.1, 0.2, ...]') as score,
ROW_NUMBER() OVER (ORDER BY (embedding <=> '[0.1, 0.2, ...]') ASC) as rank
FROM products
)
SELECT
COALESCE(b.id, v.id) as product_id,
(COALESCE(1.0 / (60 + b.rank), 0) + COALESCE(1.0 / (60 + v.rank), 0)) as rrf_score
FROM bm25_results b
FULL OUTER JOIN vector_results v ON b.id = v.id
ORDER BY rrf_score DESC
LIMIT 20;This query demonstrates the power of combining these two worlds. By using a FULL OUTER JOIN, we ensure that even if a document is only found by one method, it is still considered, but documents found by both receive a significantly higher RRF score.
Testing Performance: Retrieval Accuracy Improvements
To validate your hybrid search bm25 vector implementation, you must move beyond anecdotal testing. We recommend a three-pronged evaluation strategy:
- Precision@K: Measure how many of the top K results are relevant.
- Mean Reciprocal Rank (MRR): Evaluate how high the first relevant result appears in the list.
- A/B Testing with Click-Through Rate (CTR): Deploy the hybrid search to a subset of users and monitor if they click on results more frequently compared to the legacy keyword-only search.
Expected Performance Gains
| Metric | Keyword Only | Vector Only | Hybrid (RRF) | | :--- | :--- | :--- | :--- | | Exact Match Accuracy | 95% | 40% | 98% | | Conceptual Query Accuracy | 30% | 90% | 88% | | Overall User Satisfaction | Low | Medium | High |
As shown in the table, the hybrid approach provides the "best of both worlds." While vector search might lose a negligible amount of conceptual accuracy due to the influence of the BM25 ranking, the gain in exact-match reliability makes the system significantly more trustworthy for end-users.
Ready to Automate Your Business with AI?
We integrate custom LLMs, vector search engines, and agentic workflows (CrewAI, LangGraph) to scale your business operations.
Conclusion
The transition from simple keyword search to a sophisticated hybrid search bm25 vector system is a critical milestone for any high-growth engineering team. By acknowledging the limitations of pure semantic search and embracing the mathematical elegance of Reciprocal Rank Fusion, you can build search infrastructure that is both intelligent and precise.
Whether you are building a semantic search embeddings app for internal documentation or a customer-facing e-commerce engine, the principles of hybrid retrieval remain the same: use BM25 for the "what" and vector embeddings for the "why." As you refine your implementation, remember that search is an iterative process—continuously monitor your RRF scores and adjust your weights to match the unique language and intent of your specific user base.
