CategoryGraph

CategoryGraph(children=dict(), parents=dict(), roots=list(), leaves=list())

DAG of string-labeled category nodes with weighted edges.

A hierarchy is a DAG where each node has at most one parent (a tree). This structure supports general DAGs (multiple parents) but most domain taxonomies will be trees.

Attributes

children : dict mapping parent → list of (child, weight) parents : dict mapping child → list of (parent, weight) roots : nodes with no parents leaves : nodes with no children

Methods

Name Description
all_nodes Return all nodes in the graph.
from_dict Reconstruct from to_dict() output.
from_edges Build from explicit edge list [(parent, child, weight)].
from_json Reconstruct from JSON string.
from_levels Build DAG from co-occurring level arrays [L0, L1, L2, …].
from_single_level Depth-1 graph: _root_ → unique labels.
get_ancestors All ancestors via BFS upward.
get_children Direct children of node.
get_depth Shortest path length from any root to node (0 for roots).
get_descendants All descendants via BFS downward.
get_parents Direct parents of node.
is_tree True if every node has at most one parent.
items_at_depth Resolve each item’s label to the requested depth.
lca_depth Depth of the lowest common ancestor of two nodes.
max_depth Maximum depth across all nodes.
nodes_at_depth All nodes whose shortest root distance equals depth.
summary Human-readable stats.
to_dict JSON-serializable dict (edge list format).
to_json Serialize to JSON string.

all_nodes

CategoryGraph.all_nodes()

Return all nodes in the graph.

from_dict

CategoryGraph.from_dict(data)

Reconstruct from to_dict() output.

from_edges

CategoryGraph.from_edges(edges)

Build from explicit edge list [(parent, child, weight)].

If tuples have 2 elements, weight defaults to 1.0.

from_json

CategoryGraph.from_json(s)

Reconstruct from JSON string.

from_levels

CategoryGraph.from_levels(level_columns)

Build DAG from co-occurring level arrays [L0, L1, L2, …].

L0 is the coarsest (root-adjacent) level, L_last is finest. Edges are created between adjacent levels based on co-occurrence. Edge weights reflect the fraction of items in the parent that belong to each child.

from_single_level

CategoryGraph.from_single_level(labels)

Depth-1 graph: _root_ → unique labels.

This is the degenerate case for flat categorical columns.

get_ancestors

CategoryGraph.get_ancestors(node, max_depth=100)

All ancestors via BFS upward.

get_children

CategoryGraph.get_children(node)

Direct children of node.

get_depth

CategoryGraph.get_depth(node)

Shortest path length from any root to node (0 for roots).

get_descendants

CategoryGraph.get_descendants(node, max_depth=100)

All descendants via BFS downward.

get_parents

CategoryGraph.get_parents(node)

Direct parents of node.

is_tree

CategoryGraph.is_tree()

True if every node has at most one parent.

items_at_depth

CategoryGraph.items_at_depth(depth, item_labels)

Resolve each item’s label to the requested depth.

For items whose finest label is deeper than depth, walk parents until a node at the target depth is found. For items already at or above depth, use their label as-is.

Parameters

depth : int Target depth (0 = roots). item_labels : (n,) str array Per-item labels (typically the finest/leaf labels).

Returns

resolved : (n,) str array

lca_depth

CategoryGraph.lca_depth(node_a, node_b)

Depth of the lowest common ancestor of two nodes.

Returns the depth of the deepest node that is an ancestor of both node_a and node_b (or one of them, if one is an ancestor of the other). Returns -1 if either node is absent or they share no common ancestor.

max_depth

CategoryGraph.max_depth()

Maximum depth across all nodes.

nodes_at_depth

CategoryGraph.nodes_at_depth(depth)

All nodes whose shortest root distance equals depth.

summary

CategoryGraph.summary()

Human-readable stats.

to_dict

CategoryGraph.to_dict()

JSON-serializable dict (edge list format).

to_json

CategoryGraph.to_json()

Serialize to JSON string.