compute_domain_stopwords

compute_domain_stopwords(titles, threshold=0.1)

Words appearing in >threshold fraction of titles. Corpus-level stop words.

Parameters

Name Type Description Default
titles list[str] List of title strings. required
threshold float Fraction threshold (0.0-1.0). Words appearing in more than this fraction of titles are considered domain stop words. 0.1

Returns

Name Type Description
set[str] Set of domain-specific stop word strings.