cluster_quality

cluster_quality(coherence, cluster_labels, threshold_pct=75)

Identify meta/structural clusters from neighbor coherence.

Clusters whose members have high mean coherence are structural echo chambers — collections of documents that reference each other heavily (date pages, year pages, event lists) rather than covering a genuine topic.

Parameters

Name Type Description Default
coherence (n,) array of per-point neighbor coherence values (from neighbor_coherence). required
cluster_labels (n,) array of cluster assignments. required
threshold_pct Percentile of per-cluster mean coherence above which a cluster is flagged as meta. Default 75. 75

Returns

Name Type Description
dict with: cluster_mean_coherence: (n_clusters,) array of per-cluster mean coherence. meta_clusters: set of cluster IDs above the threshold. threshold: the computed coherence threshold value.