skrub.Expr.skb.get_pipeline#

Expr.skb.get_pipeline(fitted=False)[source]#

Get a skrub pipeline for this expression.

Returns a SkrubPipeline.

Please see the examples gallery for full information about expressions and the pipelines they generate.

Provides a skrub pipeline with a fit() method so we can fit it to some training data and then apply it to unseen data by calling transform() or predict().

An important difference between skrub pipelines and scikit-learn estimators is that fit(), transform() etc. accept a dictionary of inputs rather than X and y arguments (see examples below).

We can pass fitted=True to get a pipeline fitted to the data provided as the values in skrub.var("name", value=...) and skrub.X(value).

Parameters:
fittedbool (default=False)

If true, the returned pipeline is fitted to the data provided when initializing variables in the expression.

Returns:
pipeline

A skrub pipeline with an interface similar to scikit-learn’s, except that its methods accept a dictionary of named inputs rather than X and y arguments.

Examples

>>> import skrub
>>> from sklearn.dummy import DummyClassifier
>>> orders_df = skrub.toy_orders().orders
>>> orders = skrub.var('orders', orders_df)
>>> X = orders.drop(columns='delayed', errors='ignore').skb.mark_as_X()
>>> y = orders['delayed'].skb.mark_as_y()
>>> pred = X.skb.apply(skrub.TableVectorizer()).skb.apply(
...     DummyClassifier(), y=y
... )
>>> pred
<Apply DummyClassifier>
Result:
―――――――
   delayed
0    False
1    False
2    False
3    False
>>> pipeline = pred.skb.get_pipeline(fitted=True)
>>> new_orders_df = skrub.toy_orders(split='test').X
>>> new_orders_df
   ID product  quantity        date
4   5     cup         5  2020-04-11
5   6    fork         2  2020-04-12
>>> pipeline.predict({'orders': new_orders_df})
array([False, False])

Note that the 'orders' key in the dictionary passed to predict corresponds to the name 'orders' in skrub.var('orders', orders_df) above.