tabular_learner#

skrub.tabular_learner(estimator, *, n_jobs=None)[source]#

Get a simple machine-learning pipeline for tabular data.

Deprecated since version 0.6.0: The functionality provided by this function is now implemented in tabular_pipeline().

'regressor', 'regression', 'classifier', 'classification', this function creates a scikit-learn pipeline that extracts numeric features, imputes missing values and scales the data if necessary, then applies the estimator.

Note

The heuristics used by the tabular_pipeline to define an appropriate preprocessing based on the estimator may change in future releases.

Changed in version 0.6.0: The high cardinality encoder has been changed from MinHashEncoder to StringEncoder.

Parameters:
estimator{“regressor”, “regression”, “classifier”, “classification”} or sklearn.base.BaseEstimator

The estimator to use as the final step in the pipeline. Based on the type of estimator, the previous preprocessing steps and their respective parameters are chosen. The possible values are:

n_jobsint, default=None

Number of jobs to run in parallel in the TableVectorizer step. None means 1 unless in a joblib parallel_backend context. -1 means using all processors.

Returns:
Pipeline

A scikit-learn Pipeline chaining some preprocessing and the provided estimator.