skrub.Expr.skb.cross_validate#

Expr.skb.cross_validate(environment=None, *, keep_subsampling=False, **kwargs)[source]#

Cross-validate the expression.

This generates the pipeline with default hyperparameters and runs scikit-learn cross-validation.

Parameters:

environmentdict or None: Bindings for variables contained in the expression. If not provided, the value``s passed when initializing ``var() are used.
keep_subsamplingbool, default=False: If True, and if subsampling has been configured (see Expr.skb.subsample()), use a subsample of the data. By default subsampling is not applied and all the data is used.
kwargsdict: All other named arguments are forwarded to sklearn.model_selection.cross_validate, except that scikit-learn’s return_estimator parameter is named return_pipeline here.

Returns:

dict: Cross-validation results.

Examples

>>> from sklearn.datasets import make_classification
>>> from sklearn.linear_model import LogisticRegression
>>> import skrub

>>> X_a, y_a = make_classification(random_state=0)
>>> X, y = skrub.X(X_a), skrub.y(y_a)
>>> pred = X.skb.apply(LogisticRegression(), y=y)
>>> pred.skb.cross_validate(cv=2)['test_score']
0    0.84
1    0.78
Name: test_score, dtype: float64

Passing some data:

>>> data = {'X': X_a, 'y': y_a}
>>> pred.skb.cross_validate(data)['test_score']
0    0.75
1    0.90
2    0.85
3    0.65
4    0.90
Name: test_score, dtype: float64

skrub.Expr.skb.cross_validate#

This Page