ApplyToFrame#
- class skrub.ApplyToFrame(transformer, cols=all(), keep_original=False, rename_columns='{}')[source]#
Apply a transformer to part of a dataframe.
A subset of the dataframe is selected and passed to the transformer (as a single input). This is different from
ApplyToCols, which fits a separate clone of the transformer to each selected column independently.Note
The
transformandfit_transformmethods oftransformermust return dataframes of the same type (polars or pandas) as the input, either by default or by supporting the scikit-learnset_outputAPI.- Parameters:
- transformerscikit-learn
Transformer The transformer to apply to the selected columns.
fit_transformandtransformmust return a DataFrame. The resulting dataframe will appear as the last columns of the output dataframe. Unselected columns will appear unchanged in the output.- cols
str, sequence ofstr, or skrub selector, optional The columns to attempt to transform. Columns outside of this selection will be passed through unchanged, without calling
fit_transformon them. The default is to transform all columns.- keep_original
bool, default=False If
True, the original columns are preserved in the output. If the transformer produces a column with the same name, the transformation result is renamed so that both columns can appear in the output. IfFalse, only the transformer’s output is included in the result, not the original columns. In all cases columns not selected bycolsare passed through.- rename_columns
str, default=’{}’ Format strings applied to all transformation output column names. For example pass
'transformed_{}'to prepend'transformed_'to all output column names. The default value does not modify the names. Renaming is not applied to columns not selected bycols.
- transformerscikit-learn
- Attributes:
- all_inputs_
listofstr All column names in the input dataframe.
- used_inputs_
listofstr The names of columns that were transformed.
- all_outputs_
listofstr All column names in the output dataframe.
- created_outputs_
listofstr The names of columns in the output dataframe that were created by the fitted transformer.
- transformer_
Transformer The fitted transformer.
- all_inputs_
Examples
>>> import numpy as np >>> import pandas as pd >>> df = pd.DataFrame(np.eye(4) * np.logspace(0, 3, 4), columns=list("abcd")) >>> df a b c d 0 1.0 0.0 0.0 0.0 1 0.0 10.0 0.0 0.0 2 0.0 0.0 100.0 0.0 3 0.0 0.0 0.0 1000.0 >>> from sklearn.decomposition import PCA >>> from skrub import ApplyToFrame >>> ApplyToFrame(PCA(n_components=2)).fit_transform(df).round(2) pca0 pca1 0 -249.01 -33.18 1 -249.04 -33.68 2 -252.37 66.64 3 750.42 0.22
We can restrict the transformer to a subset of columns:
>>> pca = ApplyToFrame(PCA(n_components=2), cols=["a", "b"]) >>> pca.fit_transform(df).round(2) c d pca0 pca1 0 0.0 0.0 -2.52 0.67 1 0.0 0.0 7.50 0.00 2 100.0 0.0 -2.49 -0.33 3 0.0 1000.0 -2.49 -0.33 >>> pca.used_inputs_ ['a', 'b'] >>> pca.created_outputs_ ['pca0', 'pca1'] >>> pca.transformer_ PCA(n_components=2)
It is possible to rename the output columns:
>>> pca = ApplyToFrame( ... PCA(n_components=2), cols=["a", "b"], rename_columns='my_tag-{}' ... ) >>> pca.fit_transform(df).round(2) c d my_tag-pca0 my_tag-pca1 0 0.0 0.0 -2.52 0.67 1 0.0 0.0 7.50 0.00 2 100.0 0.0 -2.49 -0.33 3 0.0 1000.0 -2.49 -0.33
We can also force preserving the original columns in the output:
>>> pca = ApplyToFrame(PCA(n_components=2), cols=["a", "b"], keep_original=True) >>> pca.fit_transform(df).round(2) a b c d pca0 pca1 0 1.0 0.0 0.0 0.0 -2.52 0.67 1 0.0 10.0 0.0 0.0 7.50 0.00 2 0.0 0.0 100.0 0.0 -2.49 -0.33 3 0.0 0.0 0.0 1000.0 -2.49 -0.33
Methods
fit(X[, y])Fit the transformer on all columns jointly.
fit_transform(X[, y])Fit the transformer on all columns jointly and transform X.
Get output feature names for transformation.
get_params([deep])Get parameters for this estimator.
set_output(*[, transform])Set output container.
set_params(**params)Set the parameters of this estimator.
transform(X)Transform a dataframe.
- fit(X, y=None)[source]#
Fit the transformer on all columns jointly.
- Parameters:
- XPandas or Polars DataFrame
The data to transform.
- yPandas or Polars
Seriesor DataFrame, default=None The target data.
- Returns:
- ApplyToFrame
The transformer itself.
- fit_transform(X, y=None)[source]#
Fit the transformer on all columns jointly and transform X.
- Parameters:
- XPandas or Polars DataFrame
The data to transform.
- yPandas or Polars
Seriesor DataFrame, default=None The target data.
- Returns:
- resultPandas or Polars DataFrame
The transformed data.
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **params
dict Estimator parameters.
- **params
- Returns:
- selfestimator instance
Estimator instance.