discover_categorical_columns

discover_categorical_columns(
    df,
    text_col='text',
    max_cardinality=500,
    min_cardinality=2,
)

Auto-detect categorical columns from a polars DataFrame.

String columns with bounded cardinality are treated as categorical axes. List[str] columns are coarsened via coarsen(strategy='first_term'). High-cardinality string columns (likely free text) and the embedding column are skipped.

Parameters

df : polars.DataFrame Input dataframe. text_col : str Name of the text column used for embedding (excluded from axes). max_cardinality : int Columns with more unique values than this are skipped. min_cardinality : int Columns with fewer unique values than this are skipped.

Returns

label_columns : dict[str, np.ndarray] Mapping of column name → per-row string labels.