.. currentmodule:: skrub .. _user_guide_data_ops_plan: Building a simple DataOps plan ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Let's build a simple DataOps plan that adds two variables together. We start by declaring the variables: >>> import skrub >>> a = skrub.var("a") >>> b = skrub.var("b") We then apply transformations (in this case, an addition) composing more complex DataOps. >>> c = a + b >>> c Finally, we can evaluate the plan by passing the **environment** in which the plan should be evaluated. The environment is a dictionary that maps variable names to their values. >>> c.skb.eval({"a": 10, "b": 6}) 16 As shown above, the special ``.skb`` attribute allows to interact with the DataOp object itself, and :meth:`.skb.eval() ` evaluates the DataOp plan. By default, :meth:`.skb.eval() ` uses the values passed in the variable definitions, but it can also take an explicit environment dictionary as an argument. Finally, we can export the plan as a ``Learner`` that can be fitted and applied to new data: >>> learner = c.skb.make_learner() >>> learner.fit_transform({"a": 10, "b": 7}) 17 When using Data Ops, it is important to ensure that all operations are being tracked by acting on the Data Ops, rather than (for example) the starting dataframe. Consider the following example: >>> import pandas as pd >>> df = pd.DataFrame({"col": [1, 2, 3]}) >>> df col 0 1 1 2 2 3 >>> df_do = skrub.var("df", df) >>> df_do Result: ――――――― col 0 1 1 2 2 3 ``df_do`` is a Data Op that wraps ``df``, so its preview shows the content of ``df``. Then, if we now modify ``df_do`` by doubling the column, we can see that both steps (the creation of the variable, and the doubling) are now tracked by the final Data Op. >>> df_doubled = df_do.assign(col=df_do["col"]*2) >>> df_doubled Result: ――――――― col 0 2 1 4 2 6 >>> print(df_doubled.skb.describe_steps()) Var 'df' ( Var 'df' )* GetItem 'col' BinOp: mul CallMethod 'assign' * Cached, not recomputed On the other hand, working directly on ``df`` leads us to the same result, but the actual operations are not being tracked. By working only on Data Ops we ensure that all the operations done on the data are added correctly to the computational graph, which then allows the resulting learner to execute all steps as intended. See :ref:`sphx_glr_auto_examples_data_ops_11_data_ops_intro.py` for an introductory example on how to use skrub DataOps on a single dataframe.