neighbor_coherence

neighbor_coherence(embeddings, knn_indices, sample_k=None)

Measure how coherent each point’s neighborhood is.

For each point, computes mean pairwise cosine similarity among its k nearest neighbors. Topical documents have coherent neighborhoods (neighbors are similar to each other). Meta/structural documents — date pages, event lists, disambiguation pages — pull in diverse neighbors that have little in common, producing low coherence.

This is a general embedding-space signal that does not depend on titles, metadata, or domain-specific heuristics.

Parameters

Name	Description	Default
embeddings	(n, d) array of embedding vectors.	required
knn_indices	(n, k) array of neighbor indices per point (as returned by sklearn NearestNeighbors, self excluded).	required
sample_k	If set, use only the first sample_k neighbors for the pairwise calculation (faster for large k).	`None`

Returns

Name	Type	Description
	np.ndarray	Float array of length n. Each value is the mean pairwise cosine
	np.ndarray	similarity among that point’s neighbors. Range roughly [0, 1];
	np.ndarray	higher values indicate meta/structural content (echo chambers of
	np.ndarray	structurally similar documents).