Parsing and encoding datetimes#
Parsing Datetime Strings#
Skrub provides helpers to parse datetime string columns automatically:
The
to_datetime()
function converts all columns in a dataframe that can be parsed as datetimes. The format can be inferred or user-specified with the format argument.The
ToDatetime
transformer follows the same logic during training and learns a mapping between columns and their formats. It then applies this mapping during the transform step.
>>> from skrub import to_datetime, ToDatetime
>>> import pandas as pd
>>> s = pd.Series(["2024-05-05T13:17:52", None, "2024-05-07T13:17:52"], name="when")
>>> to_datetime(s)
0 2024-05-05 13:17:52
1 NaT
2 2024-05-07 13:17:52
Name: when, dtype: datetime64[ns]
>>> ToDatetime().fit_transform(s)
0 2024-05-05 13:17:52
1 NaT
2 2024-05-07 13:17:52
Name: when, dtype: datetime64[ns]
Encoding and Feature Engineering on Datetimes#
Once datetime columns have been parsed, they can be encoded as numerical features with
the DatetimeEncoder
, by extracting temporal features (year, month, day,
hour, etc.). No timezone conversion is done; the timezone
in the feature is retained. The DatetimeEncoder
rejects non-datetime columns,
so it should only be applied after conversion using ToDatetime
.
Additionally, DatetimeEncoder
can include the following features:
Number of seconds from epoch (
add_total_seconds
)Day of the week (
add_weekday
)Day of the year (
add_day_of_year
)
Periodic encoding is supported through trigonometric (circular) and spline
encoding: set the periodic_encoding
parameter to circular
or spline
.

Example of periodic encoding of datetime features using circular and spline methods.#