LazyIndex

LazyIndex(path)

Memory-mapped DYF index for instant-start query serving.

Opens a .dyf file, mmap’s it, parses the FlatBuffers header, and serves queries by traversing the tree and decompressing Arrow IPC batches on demand (LRU cached).

Usage

idx = LazyIndex(“index.dyf”) results = idx.search(query_vector, k=10, nprobe=3) print(idx.tree_summary)

Attributes

Name Description
format_version DYF format version: 1 (header FB) or 2 (footer FB, append-friendly).
has_stored_fields True if this index has stored fields.
is_pq True if this index uses Product Quantization.
quantization Read quantization string from BuildParams.
stored_field_names List of stored field names (empty if no stored fields).
tree_summary Return tree stats without touching Arrow data.

Methods

Name Description
close Release mmap and file handle.
detect_enrichment_level Detect enrichment level based on stored fields and metadata.
extract_all_fields Bulk-read all leaf batches, concatenate, sort by item_index.
get_item_vector Extract a single item’s embedding vector from the index.
get_leaf Decompress and return a leaf’s Arrow RecordBatch.
get_split_eigenvalues Return PCA eigenvalues for each internal node.
get_split_hyperplanes Return the PCA hyperplane(s) for each internal node.
get_stored_fields Look up stored fields for given item indices without re-searching.
get_tree_structure Export tree hierarchy for visualization (FlatBuffers only, no Arrow).
search Search the index for nearest neighbors.
search_ivf IVF-style search: rank leaves by centroid similarity, then score.
to_faiss Export dyf index as a FAISS IVF index.

close

LazyIndex.close()

Release mmap and file handle.

detect_enrichment_level

LazyIndex.detect_enrichment_level()

Detect enrichment level based on stored fields and metadata.

Returns 0-3

0: Base (embeddings + tree only) 1: Projected (has umap_x, umap_y, umap_z stored fields) 2: Clustered (has community_id or cluster_* stored fields) 3: Viz-ready (has edge_pairs or tour_narration in metadata)

extract_all_fields

LazyIndex.extract_all_fields()

Bulk-read all leaf batches, concatenate, sort by item_index.

Returns

Name Type Description
ExtractedData dict with keys: ‘embeddings’: (n, d) float32 array, or None if the file was written without embeddings (PQ indexes return approximate reconstructions) ‘fields’: {field_name: array} for all stored fields ‘metadata’: dict of metadata key-value pairs

get_item_vector

LazyIndex.get_item_vector(item_index)

Extract a single item’s embedding vector from the index.

Scans leaf batches to find the item (building a cached mapping on first call), then returns the reconstructed float32 vector.

Parameters

Name Type Description Default
item_index The item index (as stored in the item_index column). required

Returns

Name Type Description
(dim,) float32 numpy array of the item’s embedding.

Raises

Name Type Description
KeyError If item_index is not found in the index.

get_leaf

LazyIndex.get_leaf(batch_index)

Decompress and return a leaf’s Arrow RecordBatch.

Parameters

Name Type Description Default
batch_index The batch index (from node.BatchIndex()). required

Returns

Name Type Description
pyarrow.RecordBatch with columns: item_index, embedding.

get_split_eigenvalues

LazyIndex.get_split_eigenvalues()

Return PCA eigenvalues for each internal node.

Returns

Name Type Description
dict[int, np.ndarray] {node_id: (num_bits,) float32 array} for internal nodes that
dict[int, np.ndarray] have eigenvalues stored. Empty dict for old .dyf files.

get_split_hyperplanes

LazyIndex.get_split_hyperplanes()

Return the PCA hyperplane(s) for each internal node.

Returns

Name Type Description
dict[int, np.ndarray] {node_id: (num_bits, dim) float32 array} for internal nodes only.

get_stored_fields

LazyIndex.get_stored_fields(item_indices)

Look up stored fields for given item indices without re-searching.

Scans all leaves to find matching items. Less efficient than getting fields from search results, but useful for post-hoc exploration.

Parameters

Name Type Description Default
item_indices array-like of item indices to look up. required

Returns

Name Type Description
dict mapping field name to list of values (in same order as
item_indices). Missing items get None.

get_tree_structure

LazyIndex.get_tree_structure()

Export tree hierarchy for visualization (FlatBuffers only, no Arrow).

Returns

Name Type Description
list[TreeNode] list of dicts with keys: node_id, parent_id, depth, num_items,
list[TreeNode] is_leaf, batch_index. Cached after first call.

search

LazyIndex.search(query, k=10, nprobe=3, return_routing=False)

Search the index for nearest neighbors.

Parameters

Name Type Description Default
query (dim,) query vector. required
k Number of results to return. 10
nprobe Number of leaf probes. Accepts: - int: fixed probe count (1 = single path, >1 = multi-probe) - “auto”: adaptive probing with default AdaptiveProbeConfig - AdaptiveProbeConfig: adaptive probing with custom thresholds 3
return_routing If True, populate result.routing with diagnostics. False

Returns

Name Type Description
SearchResult with indices, scores, and fields. Supports
backward-compatible unpacking: indices, scores = idx.search(…)

search_ivf

LazyIndex.search_ivf(query, k=10, nprobe=3)

IVF-style search: rank leaves by centroid similarity, then score.

Instead of LSH tree traversal, computes query similarity to all leaf centroids and probes the top-nprobe leaves. Combines DYF’s lazy loading with FAISS IVF-quality routing.

Parameters

Name Type Description Default
query (dim,) query vector. required
k Number of results to return. 10
nprobe Number of leaves to probe. 3

Returns

Name Type Description
SearchResult with indices, scores, and fields. Supports
backward-compatible unpacking: indices, scores = idx.search_ivf(…)

to_faiss

LazyIndex.to_faiss(pq_subquantizers=None, pq_bits=8)

Export dyf index as a FAISS IVF index.

Uses dyf’s leaf centroids as the coarse quantizer and populates FAISS inverted lists from dyf’s leaf embeddings. This gives dyf’s PCA-LSH partitioning with FAISS’s optimized search.

Parameters

Name Type Description Default
pq_subquantizers If set, use IndexIVFPQ with this many subquantizers for compression (e.g., 8 or 16). If None, use IndexIVFFlat (no compression). None
pq_bits Bits per subquantizer for PQ (default: 8). 8

Returns

Name Type Description
faiss.IndexIVFFlat or faiss.IndexIVFPQ, ready to search.