BridgeIndex

BridgeIndex(
    n_anchors=1000,
    n_query_anchors=10,
    expansion_k=200,
    global_num_bits=12,
    facet_num_bits=10,
    dense_percentile=75,
    min_bucket_size=20,
    include_sparse_points=0,
    seed=42,
    _embeddings=None,
    _anchor_indices=None,
    _anchor_embeddings=None,
    _neighborhoods=None,
    _super_connectors=None,
    _bridge_indices=None,
    _fitted=False,
)

Two-tier RAG index using bridge-based anchors for efficient retrieval.

Instead of indexing all embeddings, BridgeIndex selects a small set of “anchor” points that provide good coverage of the embedding space. Queries first find nearby anchors, then expand to their neighborhoods.

The key innovation is using bridge points (which connect multiple LSH buckets) as anchors rather than cluster centroids. Bridges naturally occur at semantic boundaries and provide better coverage.

Attributes

Name	Type	Description
n_anchors	int	Number of anchor points to use
n_query_anchors	int	Number of anchors to retrieve per query
expansion_k	int	Size of neighborhood to expand from each anchor
global_num_bits	int	LSH bits for global bucketing (default 12)
seed	int	Random seed for reproducibility

Example

index = BridgeIndex(n_anchors=1000) index.fit(embeddings)

Single query

candidates, scores = index.query(query_vec, k=10)

Batch queries

results = index.query_batch(query_vecs, k=10)

Methods

Name	Description
evaluate_recall	Evaluate recall against brute-force search.
fit	Build the bridge index from embeddings.
get_anchors	Return the anchor point indices.
get_super_connectors	Return the super connector analysis.
query	Retrieve top-k candidates for a query embedding.
query_batch	Batch query for multiple embeddings.
summary	Return a summary of the index.

evaluate_recall

BridgeIndex.evaluate_recall(n_queries=100, k=10, seed=42)

Evaluate recall against brute-force search.

Parameters

Name	Type	Description	Default
n_queries	int	Number of random queries to test	`100`
k	int	Number of results per query	`10`
seed	int	Random seed for query selection	`42`

Returns

Name	Type	Description
	Dict[str, float]	Dict with ‘recall’, ‘avg_candidates’, ‘speedup’ metrics

fit

BridgeIndex.fit(embeddings, verbose=True)

Build the bridge index from embeddings.

Parameters

Name	Type	Description	Default
embeddings	np.ndarray	(n, d) array of embedding vectors (will be L2-normalized)	required
verbose	bool	Print progress information	`True`

Returns

Name	Type	Description
	BridgeIndex	self (for chaining)

get_anchors

BridgeIndex.get_anchors()

Return the anchor point indices.

get_super_connectors

BridgeIndex.get_super_connectors()

Return the super connector analysis.

query

BridgeIndex.query(query, k=10, n_query_anchors=None, return_scores=True)

Retrieve top-k candidates for a query embedding.

Parameters

Name	Type	Description	Default
query	np.ndarray	(d,) query embedding vector	required
k	int	Number of results to return	`10`
n_query_anchors	Optional[int]	Override default number of anchors to probe	`None`
return_scores	bool	Whether to return similarity scores	`True`

Returns

Name	Type	Description
	np.ndarray	(indices, scores) where indices are the top-k candidate indices
	Optional[np.ndarray]	and scores are their cosine similarities (or None if return_scores=False)

query_batch

BridgeIndex.query_batch(queries, k=10, n_query_anchors=None)

Batch query for multiple embeddings.

Parameters

Name	Type	Description	Default
queries	np.ndarray	(n_queries, d) array of query embeddings	required
k	int	Number of results per query	`10`
n_query_anchors	Optional[int]	Override default number of anchors to probe	`None`

Returns

Name	Type	Description
	List[Tuple[np.ndarray, np.ndarray]]	List of (indices, scores) tuples, one per query

summary

BridgeIndex.summary()

Return a summary of the index.