skrub.Expr.skb.get_randomized_search#

Expr.skb.get_randomized_search(*, fitted=False, keep_subsampling=False, **kwargs)[source]#

Find the best parameters with randomized search.

This function returns a ParamSearch, an object similar to scikit-learn’s RandomizedSearchCV, where the main difference is fit() and predict() accept a dictionary of inputs rather than X and y. The best pipeline can be returned by calling .best_pipeline_.

Parameters:

fittedbool (default=False): If True, the randomized search is fitted on the data provided when initializing variables in this expression (the data returned by .skb.get_data()).
keep_subsamplingbool (default=False): If True, and if subsampling has been configured (see Expr.skb.subsample()), fit on a subsample of the data. By default subsampling is not applied and all the data is used. This is only applied for fitting the randomized search when fitted=True, subsequent use of the randomized search is not affected by subsampling. Therefore it is an error to pass keep_subsampling=True and fitted=False (because keep_subsampling=True would have no effect).
kwargsdict: All other named arguments are forwarded to RandomizedSearchCV.

Returns:

ParamSearch: An object implementing the hyperparameter search. Besides the usual fit, predict, attributes of interest are results_, plot_results(), and ``best_pipeline_`.

See also

skrub.Expr.skb.get_grid_search: Find the best parameters with grid search.

Examples

>>> import skrub
>>> from sklearn.datasets import make_classification
>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.feature_selection import SelectKBest
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.dummy import DummyClassifier

>>> X_a, y_a = make_classification(random_state=0)
>>> X, y = skrub.X(X_a), skrub.y(y_a)
>>> selector = SelectKBest(k=skrub.choose_int(4, 20, log=True, name='k'))
>>> logistic = LogisticRegression(C=skrub.choose_float(0.1, 10.0, log=True, name="C"))
>>> rf = RandomForestClassifier(
...     n_estimators=skrub.choose_int(3, 30, log=True, name="N 🌴"),
...     random_state=0,
... )
>>> classifier = skrub.choose_from(
...     {"logistic": logistic, "rf": rf, "dummy": DummyClassifier()}, name="classifier"
... )
>>> pred = X.skb.apply(selector, y=y).skb.apply(classifier, y=y)
>>> print(pred.skb.describe_param_grid())
- k: choose_int(4, 20, log=True, name='k')
  classifier: 'logistic'
  C: choose_float(0.1, 10.0, log=True, name='C')
- k: choose_int(4, 20, log=True, name='k')
  classifier: 'rf'
  N 🌴: choose_int(3, 30, log=True, name='N 🌴')
- k: choose_int(4, 20, log=True, name='k')
  classifier: 'dummy'

>>> search = pred.skb.get_randomized_search(fitted=True, random_state=0)
>>> search.results_
    k         C  N 🌴 classifier mean_test_score
 4  4.626363  NaN   logistic             0.92
10       NaN  7.0         rf             0.89
 7  3.832217  NaN   logistic             0.87
15       NaN  6.0         rf             0.86
10  4.881255  NaN   logistic             0.85
19  3.965675  NaN   logistic             0.80
14       NaN  3.0         rf             0.77
 4       NaN  NaN      dummy             0.50
 9       NaN  NaN      dummy             0.50
 5       NaN  NaN      dummy             0.50

Please refer to the examples gallery for an in-depth explanation.

skrub.Expr.skb.get_randomized_search#

This Page