compute_domain_stopwords
compute_domain_stopwords(titles, threshold=0.1)Words appearing in >threshold fraction of titles. Corpus-level stop words.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| titles | list[str] | List of title strings. | required |
| threshold | float | Fraction threshold (0.0-1.0). Words appearing in more than this fraction of titles are considered domain stop words. | 0.1 |
Returns
| Name | Type | Description |
|---|---|---|
| set[str] | Set of domain-specific stop word strings. |