Cleaning a dataframe#

deduplicate

Deduplicate categorical data by hierarchically clustering similar strings.

Cleaner

Column-wise consistency checks and sanitization of dtypes, null values and dates.

DropUninformative

Drop column if it is found to be uninformative according to various criteria.