cross_validate#
- skrub.cross_validate(learner, environment, *, keep_subsampling=False, **kwargs)[source]#
Cross-validate a learner built from a DataOp.
This runs cross-validation from a learner that was built from a skrub DataOp with
DataOp.skb.make_learner()
,DataOp.skb.make_grid_search()
orDataOp.skb.make_randomized_search()
.It is useful to run nested cross-validation of a grid search or randomized search.
- Parameters:
- learnerskrub learner
A learner generated from a skrub DataOp.
- environment
dict
Bindings for variables contained in the DataOp.
- keep_subsampling
bool
, default=False If True, and if subsampling has been configured (see
DataOp.skb.subsample()
), use a subsample of the data. By default subsampling is not applied and all the data is used.- kwargs
dict
All other named arguments are forwarded to
sklearn.model_selection.cross_validate()
, except that scikit-learn’sreturn_estimator
parameter is namedreturn_learner
here.
- Returns:
dict
Cross-validation results.
See also
sklearn.model_selection.cross_validate()
Evaluate metric(s) by cross-validation and also record fit/score times.
skrub.DataOp.skb.make_learner()
Get a skrub learner for this DataOp.
skrub.DataOp.skb.make_grid_search()
Find the best parameters with grid search.
skrub.DataOp.skb.make_randomized_search()
Find the best parameters with grid search.
Examples
>>> from sklearn.datasets import make_classification >>> from sklearn.linear_model import LogisticRegression >>> import skrub
>>> X_a, y_a = make_classification(random_state=0) >>> X, y = skrub.X(X_a), skrub.y(y_a) >>> log_reg = LogisticRegression( ... **skrub.choose_float(0.01, 1.0, log=True, name="C") ... ) >>> pred = X.skb.apply(log_reg, y=y) >>> search = pred.skb.make_randomized_search(random_state=0) >>> skrub.cross_validate(search, pred.skb.get_data())['test_score'] 0 0.75 1 0.90 2 0.95 3 0.75 4 0.85 Name: test_score, dtype: float64