Wrangling data with good defaults#

This section covers how to build a predictive pipeline starting from a dataframe. The skrub objects described in this section can be used as strong defaults for building baseline pipelines, and can be customized for specific use cases.

Cleaner: sanitizing a dataframe
- Parsing numeric-looking strings with the Cleaner
- Downcasting float dtypes to float32 with the Cleaner
Transforming a table into an array of numeric features: TableVectorizer
- Numeric strings and categorical encoding
Building robust ML baselines with tabular_pipeline()
The logic used by the tabular pipeline is quite simple
Extending the pipeline with the .steps attribute
Using a pipeline as the estimator
Transforming selected columns with ApplyToCols
- Dealing with columns that cannot be handled by a transformer
- Advanced usage of ApplyToCols