ApplyToFrame#
- class skrub.ApplyToFrame(transformer, cols=all(), keep_original=False, rename_columns='{}')[source]#
Apply a transformer to part of a dataframe.
A subset of the dataframe is selected and passed to the transformer (as a single input). This is different from
ApplyToCols
, which fits a separate clone of the transformer to each selected column independently.Note
The
transform
andfit_transform
methods oftransformer
must return dataframes of the same type (polars or pandas) as the input, either by default or by supporting the scikit-learnset_output
API.- Parameters:
- transformerscikit-learn
Transformer
The transformer to apply to the selected columns.
fit_transform
andtransform
must return a DataFrame. The resulting dataframe will appear as the last columns of the output dataframe. Unselected columns will appear unchanged in the output.- cols
str
, sequence ofstr
, or skrub selector, optional The columns to attempt to transform. Columns outside of this selection will be passed through unchanged, without calling
fit_transform
on them. The default is to transform all columns.- keep_original
bool
, default=False If
True
, the original columns are preserved in the output. If the transformer produces a column with the same name, the transformation result is renamed so that both columns can appear in the output. IfFalse
, only the transformer’s output is included in the result, not the original columns. In all cases columns not selected bycols
are passed through.- rename_columns
str
, default=’{}’ Format strings applied to all transformation output column names. For example pass
'transformed_{}'
to prepend'transformed_'
to all output column names. The default value does not modify the names. Renaming is not applied to columns not selected bycols
.
- transformerscikit-learn
- Attributes:
- all_inputs_
list
ofstr
All column names in the input dataframe.
- used_inputs_
list
ofstr
The names of columns that were transformed.
- all_outputs_
list
ofstr
All column names in the output dataframe.
- created_outputs_
list
ofstr
The names of columns in the output dataframe that were created by the fitted transformer.
- transformer_
Transformer
The fitted transformer.
- all_inputs_
Examples
>>> import numpy as np >>> import pandas as pd >>> df = pd.DataFrame(np.eye(4) * np.logspace(0, 3, 4), columns=list("abcd")) >>> df a b c d 0 1.0 0.0 0.0 0.0 1 0.0 10.0 0.0 0.0 2 0.0 0.0 100.0 0.0 3 0.0 0.0 0.0 1000.0 >>> from sklearn.decomposition import PCA >>> from skrub import ApplyToFrame >>> ApplyToFrame(PCA(n_components=2)).fit_transform(df).round(2) pca0 pca1 0 -249.01 -33.18 1 -249.04 -33.68 2 -252.37 66.64 3 750.42 0.22
We can restrict the transformer to a subset of columns:
>>> pca = ApplyToFrame(PCA(n_components=2), cols=["a", "b"]) >>> pca.fit_transform(df).round(2) c d pca0 pca1 0 0.0 0.0 -2.52 0.67 1 0.0 0.0 7.50 0.00 2 100.0 0.0 -2.49 -0.33 3 0.0 1000.0 -2.49 -0.33 >>> pca.used_inputs_ ['a', 'b'] >>> pca.created_outputs_ ['pca0', 'pca1'] >>> pca.transformer_ PCA(n_components=2)
It is possible to rename the output columns:
>>> pca = ApplyToFrame( ... PCA(n_components=2), cols=["a", "b"], rename_columns='my_tag-{}' ... ) >>> pca.fit_transform(df).round(2) c d my_tag-pca0 my_tag-pca1 0 0.0 0.0 -2.52 0.67 1 0.0 0.0 7.50 0.00 2 100.0 0.0 -2.49 -0.33 3 0.0 1000.0 -2.49 -0.33
We can also force preserving the original columns in the output:
>>> pca = ApplyToFrame(PCA(n_components=2), cols=["a", "b"], keep_original=True) >>> pca.fit_transform(df).round(2) a b c d pca0 pca1 0 1.0 0.0 0.0 0.0 -2.52 0.67 1 0.0 10.0 0.0 0.0 7.50 0.00 2 0.0 0.0 100.0 0.0 -2.49 -0.33 3 0.0 0.0 0.0 1000.0 -2.49 -0.33
Methods
fit
(X[, y])Fit the transformer on all columns jointly.
fit_transform
(X[, y])Fit the transformer on all columns jointly and transform X.
Get output feature names for transformation.
get_params
([deep])Get parameters for this estimator.
set_output
(*[, transform])Set output container.
set_params
(**params)Set the parameters of this estimator.
transform
(X)Transform a dataframe.
- fit(X, y=None)[source]#
Fit the transformer on all columns jointly.
- Parameters:
- XPandas or Polars DataFrame
The data to transform.
- yPandas or Polars
Series
or DataFrame, default=None The target data.
- Returns:
- ApplyToFrame
The transformer itself.
- fit_transform(X, y=None)[source]#
Fit the transformer on all columns jointly and transform X.
- Parameters:
- XPandas or Polars DataFrame
The data to transform.
- yPandas or Polars
Series
or DataFrame, default=None The target data.
- Returns:
- resultPandas or Polars DataFrame
The transformed data.
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **params
dict
Estimator parameters.
- **params
- Returns:
- selfestimator instance
Estimator instance.