skrub
.DatetimeEncoder#
Usage examples at the bottom of this page.
- class skrub.DatetimeEncoder(*, resolution='hour', add_day_of_the_week=False, add_total_seconds=True, errors='coerce')[source]#
Transforms each datetime column into several numeric columns for temporal features (e.g year, month, day…).
If the dates are timezone aware, all the features extracted will correspond to the provided timezone.
- Parameters:
- resolution{“year”, “month”, “day”, “hour”, “minute”, “second”,
“microsecond”, “nanosecond”, None}, default=”hour” Extract up to this resolution. E.g.,
resolution="day"
generates the features “year”, “month”, “day” only. IfNone
, no such feature will be created (but day of the week and total seconds may still be extracted, see below).- add_day_of_the_week
bool
, default=False Add day of the week feature as a numerical feature from 0 (Monday) to 6 (Sunday).
- add_total_seconds
bool
, default=True Add the total number of seconds since Epoch.
- errors{‘coerce’, ‘raise’}, default=”coerce”
During transform: - If
"coerce"
, then invalid parsing will be set aspd.NaT
. - If"raise"
, then invalid parsing will raise an exception.
See also
GapEncoder
Encode dirty categories (strings) by constructing latent topics with continuous encoding.
MinHashEncoder
Encode string columns as a numeric array with the minhash method.
SimilarityEncoder
Encode string columns as a numeric array with n-gram string similarity.
Examples
>>> enc = DatetimeEncoder(add_total_seconds=False) >>> X = [['2022-10-15'], ['2021-12-25'], ['2020-05-18'], ['2019-10-15 12:00:00']] >>> enc.fit(X) DatetimeEncoder(add_total_seconds=False)
The encoder will output a transformed array with four columns (“year”, “month”, “day”, “hour”):
>>> enc.transform(X) array([[2022., 10., 15., 0.], [2021., 12., 25., 0.], [2020., 5., 18., 0.], [2019., 10., 15., 12.]])
- Attributes:
- column_indices_
list
ofint
Indices of the datetime-parsable columns.
- index_to_format_
dict
[int
,str
] Mapping from column indices to their datetime formats.
- index_to_features_
dict
[int
,list
[str
]] Dictionary mapping the column names to the list of datetime features extracted for each column.
- n_features_out_
int
Number of features of the transformed data.
- column_indices_
Methods
fit
(X[, y])Fit the instance to X.
fit_transform
(X[, y])Fit to data, then transform it.
get_feature_names_out
([input_features])Get output feature names for transformation.
Get metadata routing of this object.
get_params
([deep])Get parameters for this estimator.
set_output
(*[, transform])Set output container.
set_params
(**params)Set the parameters of this estimator.
transform
(X[, y])Transform
X
by replacing each datetime column with corresponding numerical features.- fit(X, y=None)[source]#
Fit the instance to X.
Select datetime-parsable columns and generate the list of datetime feature to extract.
- Parameters:
- Xarray_like, shape
(n_samples, n_features)
Input data. Columns that can’t be converted into
pandas.DatetimeIndex
and numerical values will be dropped.- y
None
Unused, only here for compatibility.
- Xarray_like, shape
- Returns:
DatetimeEncoder
Fitted DatetimeEncoder instance (self).
- fit_transform(X, y=None, **fit_params)[source]#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
- Xarray_like of shape (n_samples, n_features)
Input samples.
- yarray_like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_params
dict
Additional fit parameters.
- Returns:
- get_feature_names_out(input_features=None)[source]#
Get output feature names for transformation.
Feature names are formatted like: “<column_name>_<new_feature>” if the original data has column names, otherwise with format “<column_index>_<new_feature>” where <new_feature> is one of {“year”, “month”, “day”, “hour”, “minute”, “second”, “microsecond”, “nanosecond”, “day_of_week”}.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
None: Transform configuration is unchanged
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **params
dict
Estimator parameters.
- **params
- Returns:
- selfestimator instance
Estimator instance.
- transform(X, y=None)[source]#
Transform
X
by replacing each datetime column with corresponding numerical features.- Parameters:
- Xarray_like of shape
(n_samples, n_features)
The data to transform, where each column is a datetime feature.
- y
None
Unused, only here for compatibility.
- Xarray_like of shape
- Returns:
- X_out
ndarray
of shape(n_samples, n_features_out_)
Transformed input.
- X_out
Examples using skrub.DatetimeEncoder
#

Encoding: from a dataframe to a numerical matrix for machine learning

Handling datetime features with the DatetimeEncoder