Encoding a column#

See encoding for further details.

StringEncoder

Generate a lightweight string encoding of a given column using tf-idf vectorization and truncated singular value decomposition (SVD).

TextEncoder

Encode string features by applying a pretrained language model downloaded from the HuggingFace Hub.

MinHashEncoder

Encode string categorical features by applying the MinHash method to n-gram decompositions of strings.

GapEncoder

Encode string columns by constructing latent topics.

SimilarityEncoder

Encode string categories to a similarity matrix, to capture fuzziness across a few categories.

ToCategorical

Convert a string column to Categorical dtype.

DatetimeEncoder

Extract temporal features such as month, day of the week, … from a datetime column.

ToDatetime

Parse datetimes represented as strings and return Datetime columns.

to_datetime

Convert DataFrame or column to Datetime dtype.