BridgeIndex

BridgeIndex(
    n_anchors=1000,
    n_query_anchors=10,
    expansion_k=200,
    global_num_bits=12,
    facet_num_bits=10,
    dense_percentile=75,
    min_bucket_size=20,
    include_sparse_points=0,
    seed=42,
    _embeddings=None,
    _anchor_indices=None,
    _anchor_embeddings=None,
    _neighborhoods=None,
    _super_connectors=None,
    _bridge_indices=None,
    _fitted=False,
)

Two-tier RAG index using bridge-based anchors for efficient retrieval.

Instead of indexing all embeddings, BridgeIndex selects a small set of “anchor” points that provide good coverage of the embedding space. Queries first find nearby anchors, then expand to their neighborhoods.

The key innovation is using bridge points (which connect multiple LSH buckets) as anchors rather than cluster centroids. Bridges naturally occur at semantic boundaries and provide better coverage.

Attributes

Name Type Description
n_anchors int Number of anchor points to use
n_query_anchors int Number of anchors to retrieve per query
expansion_k int Size of neighborhood to expand from each anchor
global_num_bits int LSH bits for global bucketing (default 12)
seed int Random seed for reproducibility

Example

index = BridgeIndex(n_anchors=1000) index.fit(embeddings)

Single query

candidates, scores = index.query(query_vec, k=10)

Batch queries

results = index.query_batch(query_vecs, k=10)

Methods

Name Description
evaluate_recall Evaluate recall against brute-force search.
fit Build the bridge index from embeddings.
get_anchors Return the anchor point indices.
get_super_connectors Return the super connector analysis.
query Retrieve top-k candidates for a query embedding.
query_batch Batch query for multiple embeddings.
summary Return a summary of the index.

evaluate_recall

BridgeIndex.evaluate_recall(n_queries=100, k=10, seed=42)

Evaluate recall against brute-force search.

Parameters

Name Type Description Default
n_queries int Number of random queries to test 100
k int Number of results per query 10
seed int Random seed for query selection 42

Returns

Name Type Description
Dict[str, float] Dict with ‘recall’, ‘avg_candidates’, ‘speedup’ metrics

fit

BridgeIndex.fit(embeddings, verbose=True)

Build the bridge index from embeddings.

Parameters

Name Type Description Default
embeddings np.ndarray (n, d) array of embedding vectors (will be L2-normalized) required
verbose bool Print progress information True

Returns

Name Type Description
BridgeIndex self (for chaining)

get_anchors

BridgeIndex.get_anchors()

Return the anchor point indices.

get_super_connectors

BridgeIndex.get_super_connectors()

Return the super connector analysis.

query

BridgeIndex.query(query, k=10, n_query_anchors=None, return_scores=True)

Retrieve top-k candidates for a query embedding.

Parameters

Name Type Description Default
query np.ndarray (d,) query embedding vector required
k int Number of results to return 10
n_query_anchors Optional[int] Override default number of anchors to probe None
return_scores bool Whether to return similarity scores True

Returns

Name Type Description
np.ndarray (indices, scores) where indices are the top-k candidate indices
Optional[np.ndarray] and scores are their cosine similarities (or None if return_scores=False)

query_batch

BridgeIndex.query_batch(queries, k=10, n_query_anchors=None)

Batch query for multiple embeddings.

Parameters

Name Type Description Default
queries np.ndarray (n_queries, d) array of query embeddings required
k int Number of results per query 10
n_query_anchors Optional[int] Override default number of anchors to probe None

Returns

Name Type Description
List[Tuple[np.ndarray, np.ndarray]] List of (indices, scores) tuples, one per query

summary

BridgeIndex.summary()

Return a summary of the index.