cross_validate#

skrub.cross_validate(pipeline, environment, **kwargs)[source]#

Cross-validate a pipeline built from an expression.

This runs cross-validation from a pipeline that was built from a skrub expression with .skb.get_pipeline(), .skb.get_grid_search() or .skb.get_randomized_search().

It is useful to run nested cross-validation of a grid search or randomized search.

Parameters:
pipelineskrub pipeline

A pipeline generated from a skrub expression.

environmentdict

Bindings for variables contained in the expression.

kwargsdict

All other named arguments are forwarded to sklearn.model_selection.cross_validate(), except that scikit-learn’s return_estimator parameter is named return_pipeline here.

Returns:
dict

Cross-validation results.

See also

sklearn.model_selection.cross_validate()

Evaluate metric(s) by cross-validation and also record fit/score times.

skrub.Expr.skb.get_pipeline()

Get a skrub pipeline for this expression.

skrub.Expr.skb.get_grid_search()

Find the best parameters with grid search.

skrub.Expr.skb.get_randomized_search()

Find the best parameters with grid search.

Examples

>>> from sklearn.datasets import make_classification
>>> from sklearn.linear_model import LogisticRegression
>>> import skrub
>>> X_a, y_a = make_classification(random_state=0)
>>> X, y = skrub.X(X_a), skrub.y(y_a)
>>> log_reg = LogisticRegression(
...     **skrub.choose_float(0.01, 1.0, log=True, name="C")
... )
>>> pred = X.skb.apply(log_reg, y=y)
>>> search = pred.skb.get_randomized_search(random_state=0)
>>> skrub.cross_validate(search, pred.skb.get_data())['test_score']
0    0.75
1    0.90
2    0.95
3    0.75
4    0.85
Name: test_score, dtype: float64