Using previews for easier development and debugging#

To make interactive development easier without having to call eval() after each step, it is possible to preview the result of a DataOp by passing a value along with its name when creating a variable.

>>> import skrub
>>> a = skrub.var("a", 10) # we pass the value 10 in addition to the name
>>> b = skrub.var("b", 6)
>>> c = a + b
>>> c  # now the display of c includes a preview of the result
<BinOp: add>
Result:
―――――――
16

Previews are eager computations on the current data, and since they are computed immediately they can spot errors early on:

>>> import pandas as pd
>>> df = pd.DataFrame({"col": [1, 2, 3]})
>>> a = skrub.var("a", df)  # we pass the DataFrame as a value

Next, we use the pandas drop column and try to drop a column without specifying the axis:

>>> a.drop("col")
Traceback (most recent call last):
    ...
RuntimeError: Evaluation of '.drop()' failed.
You can see the full traceback above. The error message was:
KeyError: "['col'] not found in axis"

Note that seeing results for the values we provided does not change the fact that we are building a pipeline that we want to reuse, not just computing the result for a fixed input. The displayed result is only preview of the output on one example dataset.

>>> c.skb.eval({"a": 3, "b": 2})
5

It is not necessary to provide a value for every variable: it is however advisable to do so when possible, as it allows to catch errors early on.

Using previews for easier development and debugging#

This Page