skrub.Expr.skb.get_randomized_search#
- Expr.skb.get_randomized_search(*, fitted=False, keep_subsampling=False, **kwargs)[source]#
Find the best parameters with randomized search.
This function returns a
ParamSearch
, an object similar to scikit-learn’sRandomizedSearchCV
. The main difference is that methods such asfit()
andpredict()
accept a dictionary of inputs rather thanX
andy
. Please refer to the examples gallery for an in-depth explanation.- Parameters:
- fitted
bool
(default=False) If
True
, the randomized search is fitted on the data provided when initializing variables in this expression (the data returned by.skb.get_data()
).- keep_subsampling
bool
(default=False) If True, and if subsampling has been configured (see
Expr.skb.subsample()
), fit on a subsample of the data. By default subsampling is not applied and all the data is used. This is only applied for fitting the randomized search whenfitted=True
, subsequent use of the randomized search is not affected by subsampling. Therefore it is an error to passkeep_subsampling=True
andfitted=False
(becausekeep_subsampling=True
would have no effect).- kwargs
dict
All other named arguments are forwarded to
RandomizedSearchCV
.
- fitted
- Returns:
- ParamSearch
An object implementing the hyperparameter search. Besides the usual
fit
,predict
, attributes of interest areresults_
andplot_results()
.
See also
skrub.Expr.skb.get_grid_search
Find the best parameters with grid search.
Examples
>>> import skrub >>> from sklearn.datasets import make_classification >>> from sklearn.linear_model import LogisticRegression >>> from sklearn.feature_selection import SelectKBest >>> from sklearn.ensemble import RandomForestClassifier >>> from sklearn.dummy import DummyClassifier
>>> X_a, y_a = make_classification(random_state=0) >>> X, y = skrub.X(X_a), skrub.y(y_a) >>> selector = SelectKBest(k=skrub.choose_int(4, 20, log=True, name='k')) >>> logistic = LogisticRegression(C=skrub.choose_float(0.1, 10.0, log=True, name="C")) >>> rf = RandomForestClassifier( ... n_estimators=skrub.choose_int(3, 30, log=True, name="N 🌴"), ... random_state=0, ... ) >>> classifier = skrub.choose_from( ... {"logistic": logistic, "rf": rf, "dummy": DummyClassifier()}, name="classifier" ... ) >>> pred = X.skb.apply(selector, y=y).skb.apply(classifier, y=y) >>> print(pred.skb.describe_param_grid()) - k: choose_int(4, 20, log=True, name='k') classifier: 'logistic' C: choose_float(0.1, 10.0, log=True, name='C') - k: choose_int(4, 20, log=True, name='k') classifier: 'rf' N 🌴: choose_int(3, 30, log=True, name='N 🌴') - k: choose_int(4, 20, log=True, name='k') classifier: 'dummy'
>>> search = pred.skb.get_randomized_search(fitted=True, random_state=0) >>> search.results_ k C N 🌴 classifier mean_test_score 0 4 4.626363 NaN logistic 0.92 1 10 NaN 7.0 rf 0.89 2 7 3.832217 NaN logistic 0.87 3 15 NaN 6.0 rf 0.86 4 10 4.881255 NaN logistic 0.85 5 19 3.965675 NaN logistic 0.80 6 14 NaN 3.0 rf 0.77 7 4 NaN NaN dummy 0.50 8 9 NaN NaN dummy 0.50 9 5 NaN NaN dummy 0.50