label_clusters_frequency
label_clusters_frequency(titles, labels)Label clusters using TF-IDF token frequency (no LLM).
Per-cluster TF-IDF: tokens frequent in one cluster but rare globally win. Falls back to raw title frequency if tokenization yields nothing (e.g. numeric-only titles like “Digit 7”). Disambiguates identical labels across clusters with (2), (3) suffixes.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| titles | list[str] | List of title strings, one per item. | required |
| labels | np.ndarray | Integer cluster assignments, same length as titles. | required |
Returns
| Name | Type | Description |
|---|---|---|
| dict[int, str] | dict mapping cluster_id -> label string. |