DAGTaxonomy

DAGTaxonomy(n_nodes, diversity, children, parents, roots, leaves)

A navigable DAG taxonomy extracted from embedding space.

Provides lattice operations for exploring hierarchical relationships: - Find ancestors/descendants of a concept - Find common ancestors between concepts - Find paths through the taxonomy - Identify convergence points (abstract attractors)

The taxonomy is a true lattice (not a tree) - nodes can have multiple parents, enabling multiple inheritance and diamond patterns.

Attributes

Name Type Description
n_nodes int Total number of nodes in the taxonomy
diversity np.ndarray Neighbor diversity score for each node
children Dict[int, List[Tuple[int, float, float]]] Dict mapping node → list of (child, similarity, gap)
parents Dict[int, List[Tuple[int, float, float]]] Dict mapping node → list of (parent, similarity, gap)
roots List[int] Nodes with no parents (most general concepts)
leaves List[int] Nodes with no children (most specific concepts)

Example

from dyf import build_dag_taxonomy

taxonomy = build_dag_taxonomy(embeddings, verbose=True) print(taxonomy.summary())

Find ancestors of a concept

ancestors = taxonomy.get_ancestors(node_idx, max_depth=5)

Find common ancestors between two concepts

common = taxonomy.get_common_ancestors(idx_a, idx_b)

Get convergence points (highly connected abstract concepts)

hubs = taxonomy.get_convergence_points(min_parents=10)

Methods

Name Description
get_all_paths Find all paths from start to end.
get_ancestors Get all ancestors of a node (transitive closure upward).
get_children Get direct children of a node.
get_common_ancestors Find common ancestors of two nodes.
get_convergence_points Find convergence points (nodes with many incoming edges).
get_descendants Get all descendants of a node (transitive closure downward).
get_diamond_patterns Find diamond patterns (A → B, A → C, B → D, C → D).
get_divergence_points Find divergence points (nodes with many outgoing edges).
get_lowest_common_ancestors Find lowest common ancestors (LCAs) of two nodes.
get_node_depth Get the depth of a node (distance from nearest root).
get_parents Get direct parents of a node.
get_path Find a path from start to end (following parent→child direction).
summary Return summary statistics.

get_all_paths

DAGTaxonomy.get_all_paths(start, end, max_depth=10)

Find all paths from start to end.

In a lattice, there can be multiple paths between nodes.

Parameters

Name Type Description Default
start int Starting node (more general) required
end int Ending node (more specific) required
max_depth int Maximum path length 10

Returns

Name Type Description
List[List[int]] List of paths, where each path is a list of node indices

get_ancestors

DAGTaxonomy.get_ancestors(node, max_depth=10)

Get all ancestors of a node (transitive closure upward).

Parameters

Name Type Description Default
node int Starting node index required
max_depth int Maximum depth to traverse 10

Returns

Name Type Description
set Set of all ancestor node indices

get_children

DAGTaxonomy.get_children(node)

Get direct children of a node.

get_common_ancestors

DAGTaxonomy.get_common_ancestors(node_a, node_b, max_depth=10)

Find common ancestors of two nodes.

In a lattice (vs tree), there can be multiple common ancestors, not just a single LCA.

Parameters

Name Type Description Default
node_a int First node index required
node_b int Second node index required
max_depth int Maximum depth to search 10

Returns

Name Type Description
set Set of common ancestor indices

get_convergence_points

DAGTaxonomy.get_convergence_points(min_parents=5)

Find convergence points (nodes with many incoming edges).

These are abstract concepts where many paths converge.

Parameters

Name Type Description Default
min_parents int Minimum number of parents to qualify 5

Returns

Name Type Description
List[Tuple[int, int]] List of (node_idx, parent_count) tuples, sorted by count descending

get_descendants

DAGTaxonomy.get_descendants(node, max_depth=10)

Get all descendants of a node (transitive closure downward).

Parameters

Name Type Description Default
node int Starting node index required
max_depth int Maximum depth to traverse 10

Returns

Name Type Description
set Set of all descendant node indices

get_diamond_patterns

DAGTaxonomy.get_diamond_patterns(limit=100)

Find diamond patterns (A → B, A → C, B → D, C → D).

Diamonds indicate multiple inheritance / non-tree structure.

Parameters

Name Type Description Default
limit int Maximum number of diamonds to return 100

Returns

Name Type Description
List[Tuple[int, int, int, int]] List of (top, left, right, bottom) tuples

get_divergence_points

DAGTaxonomy.get_divergence_points(min_children=5)

Find divergence points (nodes with many outgoing edges).

These are branching hubs where paths diverge.

Parameters

Name Type Description Default
min_children int Minimum number of children to qualify 5

Returns

Name Type Description
List[Tuple[int, int]] List of (node_idx, child_count) tuples, sorted by count descending

get_lowest_common_ancestors

DAGTaxonomy.get_lowest_common_ancestors(node_a, node_b, max_depth=10)

Find lowest common ancestors (LCAs) of two nodes.

Returns ancestors that are common to both nodes but have no descendants that are also common ancestors.

Parameters

Name Type Description Default
node_a int First node index required
node_b int Second node index required
max_depth int Maximum depth to search 10

Returns

Name Type Description
List[int] List of LCA indices (may be multiple in a lattice)

get_node_depth

DAGTaxonomy.get_node_depth(node)

Get the depth of a node (distance from nearest root).

Parameters

Name Type Description Default
node int Node index required

Returns

Name Type Description
int Minimum distance to any root, or -1 if not connected

get_parents

DAGTaxonomy.get_parents(node)

Get direct parents of a node.

get_path

DAGTaxonomy.get_path(start, end, max_depth=10)

Find a path from start to end (following parent→child direction).

Parameters

Name Type Description Default
start int Starting node (more general) required
end int Ending node (more specific) required
max_depth int Maximum path length 10

Returns

Name Type Description
Optional[List[int]] List of node indices from start to end, or None if no path

summary

DAGTaxonomy.summary()

Return summary statistics.