chunk_redundancy

chunk_redundancy(bucket_ids, doc_ids)

Count same-doc siblings sharing each point’s bucket.

For each point, returns how many other chunks from the same document landed in the same bucket. 0 means unique in its bucket for that document.

Parameters

Name Type Description Default
bucket_ids Array-like of bucket assignments per point. required
doc_ids Array-like of document identifiers per point (same length). required

Returns

Name Type Description
np.ndarray Integer array of length len(bucket_ids). Each value is the number
np.ndarray of same-doc siblings in the same bucket (excluding self).