refine_dyf_tree
refine_dyf_tree(
tree,
embeddings,
min_coherence=None,
num_bits=3,
min_leaf_size=4,
max_retries=3,
seed_offset=1000,
)Re-split incoherent tree leaves with rotated LSH seeds.
Walks the tree and finds leaves whose mean cosine similarity to centroid is below a threshold. For each, tries re-splitting with different LSH seeds and accepts the split if weighted child coherence improves.
Modifies the tree in-place.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| tree | DYF tree dict from build_dyf_tree(). | required | |
| embeddings | (n, d) array of embedding vectors. | required | |
| min_coherence | Coherence threshold. Leaves below this are candidates for re-splitting. If None, uses 25th percentile of leaf coherences. | None |
|
| num_bits | LSH bits for re-splitting (default 3). | 3 |
|
| min_leaf_size | Don’t try to split leaves smaller than 2x this. | 4 |
|
| max_retries | Number of different seeds to try per leaf. | 3 |
|
| seed_offset | Base seed for re-splitting (offset from original tree seed). | 1000 |
Returns
| Name | Type | Description |
|---|---|---|
| dict with stats: n_refined, n_leaves_before, n_leaves_after, | ||
| coherence_before, coherence_after. |