build_unified_ontology

build_unified_ontology(
    embeddings,
    main_similarity_threshold=0.55,
    outlier_similarity_threshold=0.45,
    diversity_gap_threshold=0.02,
    outlier_diversity_gap=0.015,
    k_neighbors=30,
    verbose=False,
)

Build a unified ontology that covers both dense and sparse embedding regions.

Uses a two-tier approach: 1. Build main ontology with high similarity threshold (dense regions) 2. Build outlier ontology with lower threshold (sparse regions) 3. Add bridge edges to connect them 4. Exclude “double outliers” that don’t fit either

This achieves ~96% coverage vs ~89% for single-threshold approach.

Parameters

Name Type Description Default
embeddings np.ndarray Embeddings to analyze (n, d). Will be L2-normalized. required
main_similarity_threshold float Similarity threshold for main ontology (default 0.55). 0.55
outlier_similarity_threshold float Similarity threshold for outlier ontology (default 0.45). 0.45
diversity_gap_threshold float Diversity gap for main ontology edges (default 0.02). 0.02
outlier_diversity_gap float Diversity gap for outlier ontology edges (default 0.015). 0.015
k_neighbors int Number of neighbors for k-NN graph. 30
verbose bool Print progress information. False

Returns

Name Type Description
UnifiedOntologyResult UnifiedOntologyResult with combined ontology and metadata.

Example

from dyf import build_unified_ontology

result = build_unified_ontology(embeddings, verbose=True) print(result.summary())

Access the unified ontology

ontology = result.ontology ancestors = ontology.get_ancestors(node_idx)

Check which tier a node came from

if node_idx in result.main_nodes: … print(“From main ontology”) elif node_idx in result.outlier_nodes: … print(“From outlier ontology”)