Skip to main content
Ctrl+K
skrub - Home skrub - Home
  • Install
  • User guide
  • API Reference
  • Examples
    • Learning Materials
    • Release history
    • Development
    • Contributing to skrub
  • GitHub
  • Discord
  • Bluesky
  • X (ex-Twitter)
  • Install
  • User guide
  • API Reference
  • Examples
  • Learning Materials
  • Release history
  • Development
  • Contributing to skrub
  • GitHub
  • Discord
  • Bluesky
  • X (ex-Twitter)

Section Navigation

  • pipeline
    • tabular_learner
    • TableVectorizer
    • Cleaner
    • SelectCols
    • DropCols
    • DropUninformative
  • encoders
    • StringEncoder
    • TextEncoder
    • MinHashEncoder
    • GapEncoder
    • SimilarityEncoder
    • ToCategorical
    • DatetimeEncoder
    • ToDatetime
    • to_datetime
  • reporting
    • TableReport
    • patch_display
    • unpatch_display
    • column_associations
  • cleaning
    • deduplicate
  • joining
    • Joiner
    • AggJoiner
    • MultiAggJoiner
    • AggTarget
    • InterpolationJoiner
    • fuzzy_join
  • selectors
    • all
    • any_date
    • boolean
    • cardinality_below
    • categorical
    • cols
    • filter
    • filter_names
    • float
    • glob
    • has_nulls
    • integer
    • inv
    • make_selector
    • numeric
    • regex
    • select
    • string
  • expressions
    • var
    • X
    • y
    • as_expr
    • deferred
    • Expr
    • choose_bool
    • choose_float
    • choose_int
    • choose_from
    • optional
    • cross_validate
    • eval_mode
    • skrub.Expr.skb.apply
    • skrub.Expr.skb.apply_func
    • skrub.Expr.skb.clone
    • skrub.Expr.skb.concat
    • skrub.Expr.skb.cross_validate
    • skrub.Expr.skb.describe_defaults
    • skrub.Expr.skb.describe_param_grid
    • skrub.Expr.skb.describe_steps
    • skrub.Expr.skb.draw_graph
    • skrub.Expr.skb.drop
    • skrub.Expr.skb.eval
    • skrub.Expr.skb.freeze_after_fit
    • skrub.Expr.skb.full_report
    • skrub.Expr.skb.get_data
    • skrub.Expr.skb.get_pipeline
    • skrub.Expr.skb.get_grid_search
    • skrub.Expr.skb.get_randomized_search
    • skrub.Expr.skb.if_else
    • skrub.Expr.skb.iter_pipelines_grid
    • skrub.Expr.skb.iter_pipelines_randomized
    • skrub.Expr.skb.mark_as_X
    • skrub.Expr.skb.mark_as_y
    • skrub.Expr.skb.match
    • skrub.Expr.skb.preview
    • skrub.Expr.skb.select
    • skrub.Expr.skb.set_description
    • skrub.Expr.skb.set_name
    • skrub.Expr.skb.subsample
    • skrub.Expr.skb.train_test_split
    • skrub.Expr.skb.description
    • skrub.Expr.skb.is_X
    • skrub.Expr.skb.is_y
    • skrub.Expr.skb.name
    • skrub.Expr.skb.applied_estimator
    • SkrubPipeline
    • ParamSearch
  • datasets
    • fetch_bike_sharing
    • fetch_country_happiness
    • fetch_credit_fraud
    • fetch_drug_directory
    • fetch_employee_salaries
    • fetch_flight_delays
    • fetch_ken_embeddings
    • fetch_ken_table_aliases
    • fetch_ken_types
    • fetch_medical_charge
    • fetch_midwest_survey
    • fetch_movielens
    • fetch_open_payments
    • fetch_toxicity
    • fetch_traffic_violations
    • fetch_videogame_sales
    • get_data_dir
    • make_deduplication_data
  • API Reference
  • Expressions
  • skrub.Expr.skb.get_randomized_search

skrub.Expr.skb.get_randomized_search#

Expr.skb.get_randomized_search(*, fitted=False, keep_subsampling=False, **kwargs)[source]#

Find the best parameters with randomized search.

This function returns a ParamSearch, an object similar to scikit-learn’s RandomizedSearchCV, where the main difference is fit() and predict() accept a dictionary of inputs rather than X and y. The best pipeline can be returned by calling .best_pipeline_.

Parameters:
fittedbool (default=False)

If True, the randomized search is fitted on the data provided when initializing variables in this expression (the data returned by .skb.get_data()).

keep_subsamplingbool (default=False)

If True, and if subsampling has been configured (see Expr.skb.subsample()), fit on a subsample of the data. By default subsampling is not applied and all the data is used. This is only applied for fitting the randomized search when fitted=True, subsequent use of the randomized search is not affected by subsampling. Therefore it is an error to pass keep_subsampling=True and fitted=False (because keep_subsampling=True would have no effect).

kwargsdict

All other named arguments are forwarded to RandomizedSearchCV.

Returns:
ParamSearch

An object implementing the hyperparameter search. Besides the usual fit, predict, attributes of interest are results_, plot_results(), and ``best_pipeline_`.

See also

skrub.Expr.skb.get_grid_search

Find the best parameters with grid search.

Examples

>>> import skrub
>>> from sklearn.datasets import make_classification
>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.feature_selection import SelectKBest
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.dummy import DummyClassifier
>>> X_a, y_a = make_classification(random_state=0)
>>> X, y = skrub.X(X_a), skrub.y(y_a)
>>> selector = SelectKBest(k=skrub.choose_int(4, 20, log=True, name='k'))
>>> logistic = LogisticRegression(C=skrub.choose_float(0.1, 10.0, log=True, name="C"))
>>> rf = RandomForestClassifier(
...     n_estimators=skrub.choose_int(3, 30, log=True, name="N 🌴"),
...     random_state=0,
... )
>>> classifier = skrub.choose_from(
...     {"logistic": logistic, "rf": rf, "dummy": DummyClassifier()}, name="classifier"
... )
>>> pred = X.skb.apply(selector, y=y).skb.apply(classifier, y=y)
>>> print(pred.skb.describe_param_grid())
- k: choose_int(4, 20, log=True, name='k')
  classifier: 'logistic'
  C: choose_float(0.1, 10.0, log=True, name='C')
- k: choose_int(4, 20, log=True, name='k')
  classifier: 'rf'
  N 🌴: choose_int(3, 30, log=True, name='N 🌴')
- k: choose_int(4, 20, log=True, name='k')
  classifier: 'dummy'
>>> search = pred.skb.get_randomized_search(fitted=True, random_state=0)
>>> search.results_
    k         C  N 🌴 classifier mean_test_score
0   4  4.626363  NaN   logistic             0.92
1  10       NaN  7.0         rf             0.89
2   7  3.832217  NaN   logistic             0.87
3  15       NaN  6.0         rf             0.86
4  10  4.881255  NaN   logistic             0.85
5  19  3.965675  NaN   logistic             0.80
6  14       NaN  3.0         rf             0.77
7   4       NaN  NaN      dummy             0.50
8   9       NaN  NaN      dummy             0.50
9   5       NaN  NaN      dummy             0.50

Please refer to the examples gallery for an in-depth explanation.

previous

skrub.Expr.skb.get_grid_search

next

skrub.Expr.skb.if_else

On this page
  • Expr.skb.get_randomized_search()

This Page

  • Show Source

© Copyright 2018-2023, the dirty_cat developers, 2023-2025, the skrub developers.

Created using Sphinx 8.2.3.

Built with the PyData Sphinx Theme 0.16.1.