skrub.DataOp.skb.find#

DataOp.skb.find(what)[source]#

Find a node (DataOp or choice) in the computational graph.

Parameters:
whatstr or callable()
  • If a string, it is the name (set with DataOp.skb.set_name()) of the node to search for.

  • If a callable, it is the search predicate: it accepts a DataOp and returns a Boolean. The first node for which it returns True is returned.

Returns:
DataOp or None

The node named what, when what is a string, or the first node for which what returned True, if what is a callable. If nothing was found, None is returned.

See also

DataOp.skb.find_X_y

Find the nodes that have been marked with DataOp.skb.mark_as_X() and DataOp.skb.mark_as_y().

SkrubLearner.truncated_after

Truncate the (possibly fitted) SkrubLearner after the specified node.

Examples

>>> import skrub
>>> from sklearn.dummy import DummyClassifier
>>> data = skrub.datasets.toy_orders()
>>> x = skrub.X(data.X)
>>> x
<Var 'X'>
Result:
―――――――
   ID product  quantity        date
0   1     pen         2  2020-04-03
1   2     cup         3  2020-04-04
2   3     cup         5  2020-04-04
3   4   spoon         1  2020-04-05
>>> vectorized = x.skb.apply(skrub.TableVectorizer()).skb.set_name("vectorized")
>>> vectorized
<vectorized | Apply TableVectorizer>
Result:
―――――――
    ID  product_cup  product_pen  ...  date_month  date_day  date_total_seconds
0  1.0          0.0          1.0  ...         4.0       3.0        1.585872e+09
1  2.0          1.0          0.0  ...         4.0       4.0        1.585958e+09
2  3.0          1.0          0.0  ...         4.0       4.0        1.585958e+09
3  4.0          0.0          0.0  ...         4.0       5.0        1.586045e+09
[4 rows x 9 columns]
>>> y = skrub.y(data.y)
>>> y
<Var 'y'>
Result:
―――――――
0    False
1    False
2     True
3    False
Name: delayed, dtype: bool
>>> strategy = skrub.choose_from(["most_frequent", "prior"], name="strategy")
>>> pred = vectorized.skb.apply(DummyClassifier(strategy=strategy), y=y)

Find a node by its name:

>>> found = pred.skb.find("vectorized")
>>> found
<vectorized | Apply TableVectorizer>
Result:
―――――――
    ID  product_cup  product_pen  ...  date_month  date_day  date_total_seconds
0  1.0          0.0          1.0  ...         4.0       3.0        1.585872e+09
1  2.0          1.0          0.0  ...         4.0       4.0        1.585958e+09
2  3.0          1.0          0.0  ...         4.0       4.0        1.585958e+09
3  4.0          0.0          0.0  ...         4.0       5.0        1.586045e+09
[4 rows x 9 columns]
>>> found is vectorized
True

Note that choices are also considered:

>>> found = pred.skb.find("strategy")
>>> found
choose_from(['most_frequent', 'prior'], name='strategy')
>>> found is strategy
True

Find by a predicate function:

>>> def has_9_columns(data_op):
...     value = data_op.skb.preview()
...     if (shape := getattr(value, "shape")) is None:
...         return False
...     if len(shape) == 1:
...         return False
...     return shape[1] == 9
>>> found = pred.skb.find(has_9_columns)
>>> found
<vectorized | Apply TableVectorizer>
Result:
―――――――
    ID  product_cup  product_pen  ...  date_month  date_day  date_total_seconds
0  1.0          0.0          1.0  ...         4.0       3.0        1.585872e+09
1  2.0          1.0          0.0  ...         4.0       4.0        1.585958e+09
2  3.0          1.0          0.0  ...         4.0       4.0        1.585958e+09
3  4.0          0.0          0.0  ...         4.0       5.0        1.586045e+09
[4 rows x 9 columns]
>>> found is vectorized
True