Skip to main content
Ctrl+K
skrub - Home skrub - Home
  • Install
  • User guide
  • API
  • Examples
    • Learning Materials
    • Release history
    • Development
    • Contributing to skrub
  • GitHub
  • Discord
  • Bluesky
  • X (ex-Twitter)
  • Install
  • User guide
  • API
  • Examples
  • Learning Materials
  • Release history
  • Development
  • Contributing to skrub
  • GitHub
  • Discord
  • Bluesky
  • X (ex-Twitter)

Section Navigation

  • Joiner
  • AggJoiner
  • MultiAggJoiner
  • AggTarget
  • InterpolationJoiner
  • fuzzy_join
  • GapEncoder
  • MinHashEncoder
  • SimilarityEncoder
  • DatetimeEncoder
  • ToCategorical
  • ToDatetime
  • StringEncoder
  • to_datetime
  • TextEncoder
  • TableVectorizer
  • Cleaner
  • SelectCols
  • DropCols
  • DropUninformative
  • tabular_learner
  • as_expr
  • choose_bool
  • choose_float
  • choose_from
  • choose_int
  • cross_validate
  • deferred
  • eval_mode
  • optional
  • var
  • X
  • y
  • Expr
  • skrub.Expr.skb.apply
  • skrub.Expr.skb.apply_func
  • skrub.Expr.skb.clone
  • skrub.Expr.skb.concat
  • skrub.Expr.skb.cross_validate
  • skrub.Expr.skb.describe_defaults
  • skrub.Expr.skb.describe_param_grid
  • skrub.Expr.skb.describe_steps
  • skrub.Expr.skb.draw_graph
  • skrub.Expr.skb.drop
  • skrub.Expr.skb.eval
  • skrub.Expr.skb.freeze_after_fit
  • skrub.Expr.skb.full_report
  • skrub.Expr.skb.get_data
  • skrub.Expr.skb.get_pipeline
  • skrub.Expr.skb.get_grid_search
  • skrub.Expr.skb.get_randomized_search
  • skrub.Expr.skb.if_else
  • skrub.Expr.skb.iter_pipelines_grid
  • skrub.Expr.skb.iter_pipelines_randomized
  • skrub.Expr.skb.mark_as_X
  • skrub.Expr.skb.mark_as_y
  • skrub.Expr.skb.match
  • skrub.Expr.skb.preview
  • skrub.Expr.skb.select
  • skrub.Expr.skb.set_description
  • skrub.Expr.skb.set_name
  • skrub.Expr.skb.subsample
  • skrub.Expr.skb.train_test_split
  • skrub.Expr.skb.description
  • skrub.Expr.skb.is_X
  • skrub.Expr.skb.is_y
  • skrub.Expr.skb.name
  • skrub.Expr.skb.applied_estimator
  • SkrubPipeline
  • ParamSearch
  • all
  • any_date
  • boolean
  • cardinality_below
  • categorical
  • cols
  • Filter
  • filter
  • filter_names
  • float
  • glob
  • has_nulls
  • integer
  • inv
  • make_selector
  • NameFilter
  • numeric
  • regex
  • select
  • Selector
  • string
  • TableReport
  • patch_display
  • unpatch_display
  • column_associations
  • deduplicate
  • fetch_bike_sharing
  • fetch_country_happiness
  • fetch_credit_fraud
  • fetch_drug_directory
  • fetch_employee_salaries
  • fetch_flight_delays
  • fetch_ken_embeddings
  • fetch_ken_table_aliases
  • fetch_ken_types
  • fetch_medical_charge
  • fetch_midwest_survey
  • fetch_movielens
  • fetch_open_payments
  • fetch_toxicity
  • fetch_traffic_violations
  • fetch_videogame_sales
  • get_data_dir
  • make_deduplication_data
  • API
  • skrub.Expr.skb.get_randomized_search

skrub.Expr.skb.get_randomized_search#

Expr.skb.get_randomized_search(*, fitted=False, keep_subsampling=False, **kwargs)[source]#

Find the best parameters with randomized search.

This function returns a ParamSearch, an object similar to scikit-learn’s RandomizedSearchCV. The main difference is that methods such as fit() and predict() accept a dictionary of inputs rather than X and y. Please refer to the examples gallery for an in-depth explanation.

Parameters:
fittedbool (default=False)

If True, the randomized search is fitted on the data provided when initializing variables in this expression (the data returned by .skb.get_data()).

keep_subsamplingbool (default=False)

If True, and if subsampling has been configured (see Expr.skb.subsample()), fit on a subsample of the data. By default subsampling is not applied and all the data is used. This is only applied for fitting the randomized search when fitted=True, subsequent use of the randomized search is not affected by subsampling. Therefore it is an error to pass keep_subsampling=True and fitted=False (because keep_subsampling=True would have no effect).

kwargsdict

All other named arguments are forwarded to RandomizedSearchCV.

Returns:
ParamSearch

An object implementing the hyperparameter search. Besides the usual fit, predict, attributes of interest are results_ and plot_results().

See also

skrub.Expr.skb.get_grid_search

Find the best parameters with grid search.

Examples

>>> import skrub
>>> from sklearn.datasets import make_classification
>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.feature_selection import SelectKBest
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.dummy import DummyClassifier
>>> X_a, y_a = make_classification(random_state=0)
>>> X, y = skrub.X(X_a), skrub.y(y_a)
>>> selector = SelectKBest(k=skrub.choose_int(4, 20, log=True, name='k'))
>>> logistic = LogisticRegression(C=skrub.choose_float(0.1, 10.0, log=True, name="C"))
>>> rf = RandomForestClassifier(
...     n_estimators=skrub.choose_int(3, 30, log=True, name="N 🌴"),
...     random_state=0,
... )
>>> classifier = skrub.choose_from(
...     {"logistic": logistic, "rf": rf, "dummy": DummyClassifier()}, name="classifier"
... )
>>> pred = X.skb.apply(selector, y=y).skb.apply(classifier, y=y)
>>> print(pred.skb.describe_param_grid())
- k: choose_int(4, 20, log=True, name='k')
  classifier: 'logistic'
  C: choose_float(0.1, 10.0, log=True, name='C')
- k: choose_int(4, 20, log=True, name='k')
  classifier: 'rf'
  N 🌴: choose_int(3, 30, log=True, name='N 🌴')
- k: choose_int(4, 20, log=True, name='k')
  classifier: 'dummy'
>>> search = pred.skb.get_randomized_search(fitted=True, random_state=0)
>>> search.results_
    k         C  N 🌴 classifier mean_test_score
0   4  4.626363  NaN   logistic             0.92
1  10       NaN  7.0         rf             0.89
2   7  3.832217  NaN   logistic             0.87
3  15       NaN  6.0         rf             0.86
4  10  4.881255  NaN   logistic             0.85
5  19  3.965675  NaN   logistic             0.80
6  14       NaN  3.0         rf             0.77
7   4       NaN  NaN      dummy             0.50
8   9       NaN  NaN      dummy             0.50
9   5       NaN  NaN      dummy             0.50

previous

skrub.Expr.skb.get_grid_search

next

skrub.Expr.skb.if_else

On this page
  • Expr.skb.get_randomized_search()

This Page

  • Show Source

© Copyright 2018-2023, the dirty_cat developers, 2023-2025, the skrub developers.

Created using Sphinx 8.2.3.

Built with the PyData Sphinx Theme 0.16.1.