var#
- skrub.var(name, value=NULL)[source]#
Create a skrub variable.
Variables represent inputs to a DataOps plan, and the corresponding learner. They can be combined with other variables, constants, operators, function calls etc. to build up complex DataOps, which implicitly define the plan.
See the example gallery for more information about skrub DataOps.
- Parameters:
- name
str
The name for this input. It corresponds to a key in the dictionary that is passed to the learner’s
fit()
method (see Examples below). Names must be unique within a learner and must not start with"_skrub_"
- valueobject, optional
Optionally, an initial value can be given to the variable. When it is available, it is used to provide a preview of the learner’s results, to detect errors in the learner early, and to provide better help and tab-completion in interactive Python shells.
- name
- Returns:
- A skrub variable
- Raises:
- TypeError
If the provided value is a skrub DataOp or a skrub choose_* function.
See also
Examples
Variables without a value:
>>> import skrub >>> a = skrub.var('a') >>> a <Var 'a'> >>> b = skrub.var('b') >>> c = a + b >>> c <BinOp: add> >>> print(c.skb.describe_steps()) Var 'a' Var 'b' BinOp: add
The names of variables correspond to keys in the inputs:
>>> c.skb.eval({'a': 10, 'b': 6}) 16
And also to keys to the inputs to the DataOps plan:
>>> learner = c.skb.make_learner() >>> learner.fit_transform({'a': 5, 'b': 4}) 9
When providing a value, we see what the learner produces for the values we provided:
>>> a = skrub.var('a', 2) >>> b = skrub.var('b', 3) >>> b <Var 'b'> Result: ――――――― 3 >>> c = a + b >>> c <BinOp: add> Result: ――――――― 5
The values are also used as defaults for
eval()
:>>> c.skb.eval() 5
But we can still override them. And inputs must be provided explicitly when using the learner returned by
.skb.make_learner()
.>>> c.skb.eval({'a': 10, 'b': 6}) 16
Much more information about skrub variables is provided in the examples gallery.
Gallery examples#
Introduction to machine-learning pipelines with skrub DataOps
Multiples tables: building machine learning pipelines with DataOps