AggTarget#
- class skrub.AggTarget(main_key, operations, *, suffix='_target')[source]#
Aggregate a target y before joining its aggregation on a base dataframe.
Accepts
pandas.DataFrame
orpolars.DataFrame
inputs.- Parameters:
- main_key
str
or iterable ofstr
Select the columns from the main table to use as keys during the aggregation of the target and during the join operation.
If main_key refer to a single column, a single aggregation for this key will be generated and a single join will be performed.
If main_key is a list of keys, a multi-column aggregation will be performed on the target.
- operations
str
or iterable ofstr
Aggregation operations to perform on the target.
Supported operations are “count”, “mode”, “min”, “max”, “sum”, “median”, “mean”, “std”. The operations “sum”, “median”, “mean”, “std” are reserved to numeric type targets.
- suffix
str
, default=”_target” The suffix to append to the columns of the target table if the join results in duplicates columns.
- main_key
See also
Examples
>>> import pandas as pd >>> import numpy as np >>> from skrub import AggTarget >>> X = pd.DataFrame({ ... "flightId": range(1, 7), ... "from_airport": [1, 1, 1, 2, 2, 2], ... "total_passengers": [90, 120, 100, 70, 80, 90], ... "company": ["DL", "AF", "AF", "DL", "DL", "TR"], ... }) >>> y = np.array([1, 1, 0, 0, 1, 1]) >>> agg_target = AggTarget( ... main_key="company", ... operations=["mean", "max"], ... ) >>> agg_target.fit_transform(X, y) flightId from_airport ... y_0_mean_target y_0_max_target 0 1 1 ... 0.666667 1 1 2 1 ... 0.500000 1 2 3 1 ... 0.500000 1 3 4 2 ... 0.666667 1 4 5 2 ... 0.666667 1 5 6 2 ... 1.000000 1
Methods
fit
(X, y)Aggregate the target y based on keys from X.
fit_transform
(X, y)Aggregate the target y based on keys from X.
Get output feature names for transformation.
Get metadata routing of this object.
get_params
([deep])Get parameters for this estimator.
set_output
(*[, transform])Set output container.
set_params
(**params)Set the parameters of this estimator.
transform
(X)Left-join pre-aggregated target on X.
- fit(X, y)[source]#
Aggregate the target y based on keys from X.
- Parameters:
- XDataFrameLike
Must contains the columns names defined in main_key.
- yDataFrameLike or SeriesLike or ArrayLike
y length must match X length. The target can be continuous or discrete, with multiple columns.
- Returns:
- AggTarget
Fitted
AggTarget
instance (self).
- fit_transform(X, y)[source]#
Aggregate the target y based on keys from X.
- Parameters:
- XDataFrameLike
Must contains the columns names defined in main_key.
- yDataFrameLike or SeriesLike or ArrayLike
y length must match X length. The target can be continuous or discrete, with multiple columns.
- Returns:
- Dataframe
The augmented input.
- get_feature_names_out()[source]#
Get output feature names for transformation.
- Returns:
- List of
str
Transformed feature names.
- List of
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **params
dict
Estimator parameters.
- **params
- Returns:
- selfestimator instance
Estimator instance.