DataOps#

Generalizing the scikit-learn pipeline. See skrub DataOps for further details.

var

Create a skrub variable.

X

Create a skrub variable and mark it as being X.

y

Create a skrub variable and mark it as being y.

as_data_op

Create a DataOp DataOp that evaluates to the given value.

deferred

Wrap function calls in a DataOp DataOp.

The DataOp object.

DataOp

Representation of a computation that can be used to build DataOps plans and learners.

Inline hyperparameters selection in your DataOps plan.

choose_bool

A choice between True and False.

choose_float

A choice of floating-point numbers from a numeric range.

choose_int

A choice of integers from a numeric range.

choose_from

A choice among several possible outcomes.

optional

A choice between value and None.

Evaluate your DataOps plan.

cross_validate

Cross-validate a learner built from a DataOp.

eval_mode

Return the mode in which the DataOp is currently being evaluated.

The skb accessor exposes all DataOps methods and attributes.

DataOp.skb.apply

Apply a scikit-learn estimator to a dataframe or numpy array.

DataOp.skb.apply_func

Apply the given function.

DataOp.skb.clone

Get an independent clone of the DataOp.

DataOp.skb.concat

Concatenate dataframes vertically or horizontally.

DataOp.skb.cross_validate

Cross-validate the DataOp plan.

DataOp.skb.describe_defaults

Describe the hyper-parameters used by the default learner.

DataOp.skb.describe_param_grid

Describe the hyper-parameters extracted from choices in the DataOp.

DataOp.skb.describe_steps

Get a text representation of the computation graph.

DataOp.skb.draw_graph

Get an SVG string representing the computation graph.

DataOp.skb.drop

Drop some columns.

DataOp.skb.eval

Evaluate the DataOp.

DataOp.skb.freeze_after_fit

Freeze the result during learner fitting.

DataOp.skb.full_report

Generate a full report of the DataOp's evaluation.

DataOp.skb.get_data

Collect the values of the variables contained in the DataOp.

DataOp.skb.make_learner

Get a skrub learner for this DataOp.

DataOp.skb.make_grid_search

Find the best parameters with grid search.

DataOp.skb.make_randomized_search

Find the best parameters with randomized search.

DataOp.skb.if_else

Create a conditional DataOp.

DataOp.skb.iter_learners_grid

Get learners with different parameter combinations.

DataOp.skb.iter_learners_randomized

Get learners with different parameter combinations.

DataOp.skb.mark_as_X

Mark this DataOp as being the X table.

DataOp.skb.mark_as_y

Mark this DataOp as being the y table.

DataOp.skb.match

Select based on the value of a DataOp.

DataOp.skb.preview

Get the value computed for previews (shown when printing the DataOp).

DataOp.skb.select

Select a subset of columns.

DataOp.skb.set_description

Give a description to this DataOp.

DataOp.skb.set_name

Give a name to this DataOp.

DataOp.skb.subsample

Configure subsampling of a dataframe or numpy array.

DataOp.skb.train_test_split

Split an environment into a training an testing environments.

Accessor attributes.

DataOp.skb.description

A user-defined description or comment about the DataOp.

DataOp.skb.is_X

Whether this DataOp has been marked with skb.mark_as_X().

DataOp.skb.is_y

Whether this DataOp has been marked with skb.mark_as_y().

DataOp.skb.name

A user-chosen name for the DataOp.

DataOp.skb.applied_estimator

Retrieve the estimator applied in the previous step, as a DataOp.

Objects generated by the DataOps.

SkrubLearner

Learner that evaluates a skrub DataOp.

ParamSearch

Learner that evaluates a skrub DataOp with hyperparameter tuning.