label_clusters_frequency

label_clusters_frequency(titles, labels)

Label clusters using TF-IDF token frequency (no LLM).

Per-cluster TF-IDF: tokens frequent in one cluster but rare globally win. Falls back to raw title frequency if tokenization yields nothing (e.g. numeric-only titles like “Digit 7”). Disambiguates identical labels across clusters with (2), (3) suffixes.

Parameters

Name Type Description Default
titles list[str] List of title strings, one per item. required
labels np.ndarray Integer cluster assignments, same length as titles. required

Returns

Name Type Description
dict[int, str] dict mapping cluster_id -> label string.