.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/03_datetime_encoder.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. or to run this example in your browser via JupyterLite or Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_03_datetime_encoder.py: .. _example_datetime_encoder : =================================================== Handling datetime features with the DatetimeEncoder =================================================== In this example, we illustrate how to better integrate datetime features in machine learning models with the |DatetimeEncoder|. This encoder breaks down passed datetime features into relevant numerical features, such as the month, the day of the week, the hour of the day, etc. It is used by default in the |TableVectorizer|. .. |DatetimeEncoder| replace:: :class:`~skrub.DatetimeEncoder` .. |TableVectorizer| replace:: :class:`~skrub.TableVectorizer` .. |OneHotEncoder| replace:: :class:`~sklearn.preprocessing.OneHotEncoder` .. |TimeSeriesSplit| replace:: :class:`~sklearn.model_selection.TimeSeriesSplit` .. |ColumnTransformer| replace:: :class:`~sklearn.compose.ColumnTransformer` .. |make_column_transformer| replace:: :class:`~sklearn.compose.make_column_transformer` .. |HGBR| replace:: :class:`~sklearn.ensemble.HistGradientBoostingRegressor` .. |ToDatetime| replace:: :class:`~skrub.ToDatetime` .. GENERATED FROM PYTHON SOURCE LINES 43-49 A problem with relevant datetime features ----------------------------------------- We will use a dataset of bike sharing demand in 2011 and 2012. In this setting, we want to predict the number of bike rentals, based on the date, time and weather conditions. .. GENERATED FROM PYTHON SOURCE LINES 49-64 .. code-block:: Python from pprint import pprint import pandas as pd data = pd.read_csv( "https://raw.githubusercontent.com/skrub-data/datasets/master" "/data/bike-sharing-dataset.csv" ) # Extract our input data (X) and the target column (y) y = data["cnt"] X = data[["date", "holiday", "temp", "hum", "windspeed", "weathersit"]] X .. raw:: html
date holiday temp hum windspeed weathersit
0 2011-01-01 00:00:00 0 0.24 0.81 0.0000 1
1 2011-01-01 01:00:00 0 0.22 0.80 0.0000 1
2 2011-01-01 02:00:00 0 0.22 0.80 0.0000 1
3 2011-01-01 03:00:00 0 0.24 0.75 0.0000 1
4 2011-01-01 04:00:00 0 0.24 0.75 0.0000 1
... ... ... ... ... ... ...
17374 2012-12-31 19:00:00 0 0.26 0.60 0.1642 2
17375 2012-12-31 20:00:00 0 0.26 0.60 0.1642 2
17376 2012-12-31 21:00:00 0 0.26 0.60 0.1642 1
17377 2012-12-31 22:00:00 0 0.26 0.56 0.1343 1
17378 2012-12-31 23:00:00 0 0.26 0.65 0.1343 1

17379 rows × 6 columns



.. GENERATED FROM PYTHON SOURCE LINES 65-67 .. code-block:: Python y .. rst-class:: sphx-glr-script-out .. code-block:: none 0 16 1 40 2 32 3 13 4 1 ... 17374 119 17375 89 17376 90 17377 61 17378 49 Name: cnt, Length: 17379, dtype: int64 .. GENERATED FROM PYTHON SOURCE LINES 68-69 We convert the dataframe's ``"date"`` column using |ToDatetime|. .. GENERATED FROM PYTHON SOURCE LINES 69-76 .. code-block:: Python from skrub import ToDatetime date = ToDatetime().fit_transform(X["date"]) print("original dtype:", X["date"].dtypes, "\n\nconverted dtype:", date.dtypes) .. rst-class:: sphx-glr-script-out .. code-block:: none original dtype: object converted dtype: datetime64[ns] .. GENERATED FROM PYTHON SOURCE LINES 77-86 Encoding the features ..................... We now encode this column with a |DatetimeEncoder|. During the instantiation of the |DatetimeEncoder|, we specify that we want to extract the day of the week, and that we don't want to extract anything finer than hours. This is because we don't want to extract minutes, seconds and lower units, as they are unimportant. .. GENERATED FROM PYTHON SOURCE LINES 86-93 .. code-block:: Python from skrub import DatetimeEncoder date_enc = DatetimeEncoder().fit_transform(date) print(date, "\n\nHas been encoded as:\n\n", date_enc) .. rst-class:: sphx-glr-script-out .. code-block:: none 0 2011-01-01 00:00:00 1 2011-01-01 01:00:00 2 2011-01-01 02:00:00 3 2011-01-01 03:00:00 4 2011-01-01 04:00:00 ... 17374 2012-12-31 19:00:00 17375 2012-12-31 20:00:00 17376 2012-12-31 21:00:00 17377 2012-12-31 22:00:00 17378 2012-12-31 23:00:00 Name: date, Length: 17379, dtype: datetime64[ns] Has been encoded as: date_year date_month date_day date_hour date_total_seconds 0 2011.0 1.0 1.0 0.0 1.293840e+09 1 2011.0 1.0 1.0 1.0 1.293844e+09 2 2011.0 1.0 1.0 2.0 1.293847e+09 3 2011.0 1.0 1.0 3.0 1.293851e+09 4 2011.0 1.0 1.0 4.0 1.293854e+09 ... ... ... ... ... ... 17374 2012.0 12.0 31.0 19.0 1.356980e+09 17375 2012.0 12.0 31.0 20.0 1.356984e+09 17376 2012.0 12.0 31.0 21.0 1.356988e+09 17377 2012.0 12.0 31.0 22.0 1.356991e+09 17378 2012.0 12.0 31.0 23.0 1.356995e+09 [17379 rows x 5 columns] .. GENERATED FROM PYTHON SOURCE LINES 94-97 We see that the encoder is working as expected: the column has been replaced by features extracting the month, day, hour, day of the week and total seconds since Epoch information. .. GENERATED FROM PYTHON SOURCE LINES 99-106 One-liner with the TableVectorizer .................................. As mentioned earlier, the |TableVectorizer| makes use of the |DatetimeEncoder| by default. Note that ``X["date"]`` is still a string, but will be automatically transformed into a datetime in the |TableVectorizer|. .. GENERATED FROM PYTHON SOURCE LINES 106-112 .. code-block:: Python from skrub import TableVectorizer table_vec = TableVectorizer().fit(X) pprint(table_vec.get_feature_names_out()) .. rst-class:: sphx-glr-script-out .. code-block:: none array(['date_year', 'date_month', 'date_day', 'date_hour', 'date_total_seconds', 'holiday', 'temp', 'hum', 'windspeed', 'weathersit'], dtype='` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: 03_datetime_encoder.py <03_datetime_encoder.py>` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: 03_datetime_encoder.zip <03_datetime_encoder.zip>` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_