neighbor_coherence

neighbor_coherence(embeddings, knn_indices, sample_k=None)

Measure how coherent each point’s neighborhood is.

For each point, computes mean pairwise cosine similarity among its k nearest neighbors. Topical documents have coherent neighborhoods (neighbors are similar to each other). Meta/structural documents — date pages, event lists, disambiguation pages — pull in diverse neighbors that have little in common, producing low coherence.

This is a general embedding-space signal that does not depend on titles, metadata, or domain-specific heuristics.

Parameters

Name Type Description Default
embeddings (n, d) array of embedding vectors. required
knn_indices (n, k) array of neighbor indices per point (as returned by sklearn NearestNeighbors, self excluded). required
sample_k If set, use only the first sample_k neighbors for the pairwise calculation (faster for large k). None

Returns

Name Type Description
np.ndarray Float array of length n. Each value is the mean pairwise cosine
np.ndarray similarity among that point’s neighbors. Range roughly [0, 1];
np.ndarray higher values indicate meta/structural content (echo chambers of
np.ndarray structurally similar documents).