tabular_learner#

skrub.tabular_learner(estimator, *, n_jobs=None)[source]#

Get a simple machine-learning pipeline for tabular data.

Deprecated since version 0.6.0: The functionality provided by this function is now implemented in tabular_pipeline().

'regressor', 'regression', 'classifier', 'classification', this function creates a scikit-learn pipeline that extracts numeric features, imputes missing values and scales the data if necessary, then applies the estimator.

Note

The heuristics used by the tabular_pipeline to define an appropriate preprocessing based on the estimator may change in future releases.

Changed in version 0.6.0: The high cardinality encoder has been changed from MinHashEncoder to StringEncoder.

Parameters:

estimator{“regressor”, “regression”, “classifier”, “classification”} or sklearn.base.BaseEstimator

The estimator to use as the final step in the pipeline. Based on the type of estimator, the previous preprocessing steps and their respective parameters are chosen. The possible values are:

'regressor' or 'regression': a HistGradientBoostingRegressor is used as the final step;
'classifier' or 'classification': a HistGradientBoostingClassifier is used as the final step;
a scikit-learn estimator: the provided estimator is used as the final step.

n_jobsint, default=None

Number of jobs to run in parallel in the TableVectorizer step. None means 1 unless in a joblib parallel_backend context. -1 means using all processors.

Returns:

Pipeline: A scikit-learn Pipeline chaining some preprocessing and the provided estimator.

tabular_learner#

This Page