BridgeIndex
BridgeIndex(
n_anchors=1000,
n_query_anchors=10,
expansion_k=200,
global_num_bits=12,
facet_num_bits=10,
dense_percentile=75,
min_bucket_size=20,
include_sparse_points=0,
seed=42,
_embeddings=None,
_anchor_indices=None,
_anchor_embeddings=None,
_neighborhoods=None,
_super_connectors=None,
_bridge_indices=None,
_fitted=False,
)Two-tier RAG index using bridge-based anchors for efficient retrieval.
Instead of indexing all embeddings, BridgeIndex selects a small set of “anchor” points that provide good coverage of the embedding space. Queries first find nearby anchors, then expand to their neighborhoods.
The key innovation is using bridge points (which connect multiple LSH buckets) as anchors rather than cluster centroids. Bridges naturally occur at semantic boundaries and provide better coverage.
Attributes
| Name | Type | Description |
|---|---|---|
| n_anchors | int | Number of anchor points to use |
| n_query_anchors | int | Number of anchors to retrieve per query |
| expansion_k | int | Size of neighborhood to expand from each anchor |
| global_num_bits | int | LSH bits for global bucketing (default 12) |
| seed | int | Random seed for reproducibility |
Example
index = BridgeIndex(n_anchors=1000) index.fit(embeddings)
Single query
candidates, scores = index.query(query_vec, k=10)
Batch queries
results = index.query_batch(query_vecs, k=10)
Methods
| Name | Description |
|---|---|
| evaluate_recall | Evaluate recall against brute-force search. |
| fit | Build the bridge index from embeddings. |
| get_anchors | Return the anchor point indices. |
| get_super_connectors | Return the super connector analysis. |
| query | Retrieve top-k candidates for a query embedding. |
| query_batch | Batch query for multiple embeddings. |
| summary | Return a summary of the index. |
evaluate_recall
BridgeIndex.evaluate_recall(n_queries=100, k=10, seed=42)Evaluate recall against brute-force search.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| n_queries | int | Number of random queries to test | 100 |
| k | int | Number of results per query | 10 |
| seed | int | Random seed for query selection | 42 |
Returns
| Name | Type | Description |
|---|---|---|
| Dict[str, float] | Dict with ‘recall’, ‘avg_candidates’, ‘speedup’ metrics |
fit
BridgeIndex.fit(embeddings, verbose=True)Build the bridge index from embeddings.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| embeddings | np.ndarray | (n, d) array of embedding vectors (will be L2-normalized) | required |
| verbose | bool | Print progress information | True |
Returns
| Name | Type | Description |
|---|---|---|
| BridgeIndex | self (for chaining) |
get_anchors
BridgeIndex.get_anchors()Return the anchor point indices.
get_super_connectors
BridgeIndex.get_super_connectors()Return the super connector analysis.
query
BridgeIndex.query(query, k=10, n_query_anchors=None, return_scores=True)Retrieve top-k candidates for a query embedding.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| query | np.ndarray | (d,) query embedding vector | required |
| k | int | Number of results to return | 10 |
| n_query_anchors | Optional[int] | Override default number of anchors to probe | None |
| return_scores | bool | Whether to return similarity scores | True |
Returns
| Name | Type | Description |
|---|---|---|
| np.ndarray | (indices, scores) where indices are the top-k candidate indices | |
| Optional[np.ndarray] | and scores are their cosine similarities (or None if return_scores=False) |
query_batch
BridgeIndex.query_batch(queries, k=10, n_query_anchors=None)Batch query for multiple embeddings.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| queries | np.ndarray | (n_queries, d) array of query embeddings | required |
| k | int | Number of results per query | 10 |
| n_query_anchors | Optional[int] | Override default number of anchors to probe | None |
Returns
| Name | Type | Description |
|---|---|---|
| List[Tuple[np.ndarray, np.ndarray]] | List of (indices, scores) tuples, one per query |
summary
BridgeIndex.summary()Return a summary of the index.