deduplicate_chunks
deduplicate_chunks(bucket_ids, doc_ids)Boolean mask keeping one representative per (bucket, doc) pair.
Multi-topic documents retain multiple representatives (one per bucket they touch). Single-topic documents collapse to fewer points.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| bucket_ids | Array-like of bucket assignments per point. | required | |
| doc_ids | Array-like of document identifiers per point (same length). | required |
Returns
| Name | Type | Description |
|---|---|---|
| np.ndarray | Boolean array of length len(bucket_ids). True for representative points. |