skrub.DataOp.skb.find#
- DataOp.skb.find(what)[source]#
Find a node (DataOp or choice) in the computational graph.
- Parameters:
- what
strorcallable() If a string, it is the name (set with
DataOp.skb.set_name()) of the node to search for.If a callable, it is the search predicate: it accepts a DataOp and returns a Boolean. The first node for which it returns True is returned.
- what
- Returns:
- DataOp or
None The node named
what, whenwhatis a string, or the first node for whichwhatreturned True, ifwhatis a callable. If nothing was found,Noneis returned.
- DataOp or
See also
DataOp.skb.find_X_yFind the nodes that have been marked with
DataOp.skb.mark_as_X()andDataOp.skb.mark_as_y().SkrubLearner.truncated_afterTruncate the (possibly fitted) SkrubLearner after the specified node.
Examples
>>> import skrub >>> from sklearn.dummy import DummyClassifier
>>> data = skrub.datasets.toy_orders() >>> x = skrub.X(data.X) >>> x <Var 'X'> Result: ――――――― ID product quantity date 0 1 pen 2 2020-04-03 1 2 cup 3 2020-04-04 2 3 cup 5 2020-04-04 3 4 spoon 1 2020-04-05 >>> vectorized = x.skb.apply(skrub.TableVectorizer()).skb.set_name("vectorized") >>> vectorized <vectorized | Apply TableVectorizer> Result: ――――――― ID product_cup product_pen ... date_month date_day date_total_seconds 0 1.0 0.0 1.0 ... 4.0 3.0 1.585872e+09 1 2.0 1.0 0.0 ... 4.0 4.0 1.585958e+09 2 3.0 1.0 0.0 ... 4.0 4.0 1.585958e+09 3 4.0 0.0 0.0 ... 4.0 5.0 1.586045e+09 [4 rows x 9 columns] >>> y = skrub.y(data.y) >>> y <Var 'y'> Result: ――――――― 0 False 1 False 2 True 3 False Name: delayed, dtype: bool >>> strategy = skrub.choose_from(["most_frequent", "prior"], name="strategy") >>> pred = vectorized.skb.apply(DummyClassifier(strategy=strategy), y=y)
Find a node by its name:
>>> found = pred.skb.find("vectorized") >>> found <vectorized | Apply TableVectorizer> Result: ――――――― ID product_cup product_pen ... date_month date_day date_total_seconds 0 1.0 0.0 1.0 ... 4.0 3.0 1.585872e+09 1 2.0 1.0 0.0 ... 4.0 4.0 1.585958e+09 2 3.0 1.0 0.0 ... 4.0 4.0 1.585958e+09 3 4.0 0.0 0.0 ... 4.0 5.0 1.586045e+09 [4 rows x 9 columns] >>> found is vectorized True
Note that choices are also considered:
>>> found = pred.skb.find("strategy") >>> found choose_from(['most_frequent', 'prior'], name='strategy') >>> found is strategy True
Find by a predicate function:
>>> def has_9_columns(data_op): ... value = data_op.skb.preview() ... if (shape := getattr(value, "shape")) is None: ... return False ... if len(shape) == 1: ... return False ... return shape[1] == 9
>>> found = pred.skb.find(has_9_columns) >>> found <vectorized | Apply TableVectorizer> Result: ――――――― ID product_cup product_pen ... date_month date_day date_total_seconds 0 1.0 0.0 1.0 ... 4.0 3.0 1.585872e+09 1 2.0 1.0 0.0 ... 4.0 4.0 1.585958e+09 2 3.0 1.0 0.0 ... 4.0 4.0 1.585958e+09 3 4.0 0.0 0.0 ... 4.0 5.0 1.586045e+09 [4 rows x 9 columns] >>> found is vectorized True