RejectColumn#
- exception skrub.core.RejectColumn[source]#
Used by single-column transformers to indicate they do not apply to a column.
Examples
A
SingleColumnTransformercan raiseRejectColumnexceptions to indicate it cannot handle a given column.>>> from skrub import ToDatetime >>> import pandas as pd >>> df = pd.DataFrame(dict(birthday=["29/01/2024"], city=["London"])) >>> df birthday city 0 29/01/2024 London >>> df.dtypes birthday ... city ... dtype: object >>> ToDatetime().fit_transform(df["birthday"]) 0 2024-01-29 Name: birthday, dtype: datetime64[...] >>> ToDatetime().fit_transform(df["city"]) Traceback (most recent call last): ... skrub.core.RejectColumn: Could not find a datetime format...
RejectColumnexceptions can be used to indicate that a column cannot be handled by the current transformer: this may mean that the data is invalid, or that the transformer is simply not designed to handle that type of data. For example, aToDatetimetransformer might raiseRejectColumnwhen it is passed a column that does not contain strings, or that contains strings but none of them look like dates.ApplyToColsrelies on these exceptions to decide whether a column should be transformed or passed through unchanged.How these rejections are handled depends on the
allow_rejectparameter. By default, no special handling is performed and rejections are considered to be errors:>>> from skrub import ApplyToCols >>> to_datetime = ApplyToCols(ToDatetime()) >>> to_datetime.fit_transform(df) Traceback (most recent call last): ... ValueError: Transformer ToDatetime.fit_transform failed on column 'city'...
However, setting
allow_reject=Truegives the transformer itself some control over which columns it should be applied to. For example, whether a string column contains dates is only known once we try to parse them. Therefore it might be sensible to try to parse all string columns but allow the transformer to reject those that, upon inspection, do not contain dates.>>> to_datetime = ApplyToCols(ToDatetime(), allow_reject=True) >>> transformed = to_datetime.fit_transform(df) >>> transformed birthday city 0 2024-01-29 London
Now the column ‘city’ was rejected but this was not treated as an error; ‘city’ was passed through unchanged and only ‘birthday’ was converted to a datetime column.
>>> transformed.dtypes birthday datetime64[...] city ... dtype: object >>> to_datetime.transformers_ {'birthday': ToDatetime()}
>>> from skrub import ToDatetime >>> df = pd.DataFrame(dict(a=['2020-02-02'], b=[12.5])) >>> ToDatetime().fit_transform(df['a']) 0 2020-02-02 Name: a, dtype: datetime64[...] >>> ToDatetime().fit_transform(df['b']) Traceback (most recent call last): ... skrub.core.RejectColumn: Column 'b' does not contain strings.