SelectCols#
- class skrub.SelectCols(cols)[source]#
Select a subset of a DataFrame’s columns.
A
ValueError
is raised if any of the provided column names are not in the dataframe.Accepts
pandas.DataFrame
andpolars.DataFrame
inputs.- Parameters:
Examples
>>> import pandas as pd >>> from skrub import SelectCols >>> df = pd.DataFrame({"A": [1, 2], "B": [10, 20], "C": ["x", "y"]}) >>> df A B C 0 1 10 x 1 2 20 y >>> SelectCols(["C", "A"]).fit_transform(df) C A 0 x 1 1 y 2 >>> SelectCols(["X", "A"]).fit_transform(df) Traceback (most recent call last): ... ValueError: The following columns are requested for selection but missing from dataframe: ['X']
Methods
fit
(X[, y])Fit the transformer.
fit_transform
(X[, y])Fit to data, then transform it.
Get metadata routing of this object.
get_params
([deep])Get parameters for this estimator.
set_output
(*[, transform])Set output container.
set_params
(**params)Set the parameters of this estimator.
transform
(X)Transform a dataframe by selecting columns.
- fit_transform(X, y=None, **fit_params)[source]#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
- Xarray_like of shape (n_samples, n_features)
Input samples.
- yarray_like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_params
dict
Additional fit parameters.
- Returns:
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **params
dict
Estimator parameters.
- **params
- Returns:
- selfestimator instance
Estimator instance.
Gallery examples#
Fuzzy joining dirty tables with the Joiner