CategoryGraph
CategoryGraph(children=dict(), parents=dict(), roots=list(), leaves=list())DAG of string-labeled category nodes with weighted edges.
A hierarchy is a DAG where each node has at most one parent (a tree). This structure supports general DAGs (multiple parents) but most domain taxonomies will be trees.
Attributes
children : dict mapping parent → list of (child, weight) parents : dict mapping child → list of (parent, weight) roots : nodes with no parents leaves : nodes with no children
Methods
| Name | Description |
|---|---|
| all_nodes | Return all nodes in the graph. |
| from_dict | Reconstruct from to_dict() output. |
| from_edges | Build from explicit edge list [(parent, child, weight)]. |
| from_json | Reconstruct from JSON string. |
| from_levels | Build DAG from co-occurring level arrays [L0, L1, L2, …]. |
| from_single_level | Depth-1 graph: _root_ → unique labels. |
| get_ancestors | All ancestors via BFS upward. |
| get_children | Direct children of node. |
| get_depth | Shortest path length from any root to node (0 for roots). |
| get_descendants | All descendants via BFS downward. |
| get_parents | Direct parents of node. |
| is_tree | True if every node has at most one parent. |
| items_at_depth | Resolve each item’s label to the requested depth. |
| lca_depth | Depth of the lowest common ancestor of two nodes. |
| max_depth | Maximum depth across all nodes. |
| nodes_at_depth | All nodes whose shortest root distance equals depth. |
| summary | Human-readable stats. |
| to_dict | JSON-serializable dict (edge list format). |
| to_json | Serialize to JSON string. |
all_nodes
CategoryGraph.all_nodes()Return all nodes in the graph.
from_dict
CategoryGraph.from_dict(data)Reconstruct from to_dict() output.
from_edges
CategoryGraph.from_edges(edges)Build from explicit edge list [(parent, child, weight)].
If tuples have 2 elements, weight defaults to 1.0.
from_json
CategoryGraph.from_json(s)Reconstruct from JSON string.
from_levels
CategoryGraph.from_levels(level_columns)Build DAG from co-occurring level arrays [L0, L1, L2, …].
L0 is the coarsest (root-adjacent) level, L_last is finest. Edges are created between adjacent levels based on co-occurrence. Edge weights reflect the fraction of items in the parent that belong to each child.
from_single_level
CategoryGraph.from_single_level(labels)Depth-1 graph: _root_ → unique labels.
This is the degenerate case for flat categorical columns.
get_ancestors
CategoryGraph.get_ancestors(node, max_depth=100)All ancestors via BFS upward.
get_children
CategoryGraph.get_children(node)Direct children of node.
get_depth
CategoryGraph.get_depth(node)Shortest path length from any root to node (0 for roots).
get_descendants
CategoryGraph.get_descendants(node, max_depth=100)All descendants via BFS downward.
get_parents
CategoryGraph.get_parents(node)Direct parents of node.
is_tree
CategoryGraph.is_tree()True if every node has at most one parent.
items_at_depth
CategoryGraph.items_at_depth(depth, item_labels)Resolve each item’s label to the requested depth.
For items whose finest label is deeper than depth, walk parents until a node at the target depth is found. For items already at or above depth, use their label as-is.
Parameters
depth : int Target depth (0 = roots). item_labels : (n,) str array Per-item labels (typically the finest/leaf labels).
Returns
resolved : (n,) str array
lca_depth
CategoryGraph.lca_depth(node_a, node_b)Depth of the lowest common ancestor of two nodes.
Returns the depth of the deepest node that is an ancestor of both node_a and node_b (or one of them, if one is an ancestor of the other). Returns -1 if either node is absent or they share no common ancestor.
max_depth
CategoryGraph.max_depth()Maximum depth across all nodes.
nodes_at_depth
CategoryGraph.nodes_at_depth(depth)All nodes whose shortest root distance equals depth.
summary
CategoryGraph.summary()Human-readable stats.
to_dict
CategoryGraph.to_dict()JSON-serializable dict (edge list format).
to_json
CategoryGraph.to_json()Serialize to JSON string.