skrub.Expr.skb.get_pipeline#
- Expr.skb.get_pipeline(fitted=False)[source]#
Get a skrub pipeline for this expression.
Returns a
SkrubPipeline
.Please see the examples gallery for full information about expressions and the pipelines they generate.
Provides a skrub pipeline with a
fit()
method so we can fit it to some training data and then apply it to unseen data by callingtransform()
orpredict()
.An important difference between skrub pipelines and scikit-learn estimators is that
fit()
,transform()
etc. accept a dictionary of inputs rather thanX
andy
arguments (see examples below).We can pass
fitted=True
to get a pipeline fitted to the data provided as the values inskrub.var("name", value=...)
andskrub.X(value)
.- Parameters:
- fitted
bool
(default=False) If true, the returned pipeline is fitted to the data provided when initializing variables in the expression.
- fitted
- Returns:
- pipeline
A skrub pipeline with an interface similar to scikit-learn’s, except that its methods accept a dictionary of named inputs rather than
X
andy
arguments.
Examples
>>> import skrub >>> from sklearn.dummy import DummyClassifier >>> orders_df = skrub.toy_orders().orders >>> orders = skrub.var('orders', orders_df) >>> X = orders.drop(columns='delayed', errors='ignore').skb.mark_as_X() >>> y = orders['delayed'].skb.mark_as_y() >>> pred = X.skb.apply(skrub.TableVectorizer()).skb.apply( ... DummyClassifier(), y=y ... ) >>> pred <Apply DummyClassifier> Result: ――――――― delayed 0 False 1 False 2 False 3 False >>> pipeline = pred.skb.get_pipeline(fitted=True) >>> new_orders_df = skrub.toy_orders(split='test').X >>> new_orders_df ID product quantity date 4 5 cup 5 2020-04-11 5 6 fork 2 2020-04-12 >>> pipeline.predict({'orders': new_orders_df}) array([False, False])
Note that the
'orders'
key in the dictionary passed topredict
corresponds to the name'orders'
inskrub.var('orders', orders_df)
above.