| dates | |
|---|---|
| 0 | 2023-01-03 |
| 1 | 2023-02-15 |
Datetimes come in many formats:
Correct parsing is essential for feature extraction.
ToDatetime: Single column transformer with format guessing:
| dates | |
|---|---|
| 0 | 2023-01-03 |
| 1 | 2023-02-15 |
Cleaner: Also parses datetimes with custom format:
Datetimes must be converted to numerical features:
df_dt["year"] = df_dt["dates"].dt.year
df_dt["month"] = df_dt["dates"].dt.month
df_dt["day"] = df_dt["dates"].dt.day
df_dt["weekday"] = df_dt["dates"].dt.weekday
df_dt["day_of_year"] = df_dt["dates"].dt.day_of_year
df_dt["total_seconds"] = (df_dt["dates"] - pd.Timestamp("1970-01-01")) // pd.Timedelta('1s')
df_dt| dates | year | month | day | weekday | day_of_year | total_seconds | |
|---|---|---|---|---|---|---|---|
| 0 | 2023-01-03 | 2023 | 1 | 3 | 1 | 3 | 1672704000 |
| 1 | 2023-02-15 | 2023 | 2 | 15 | 2 | 46 | 1676419200 |
| dates_year | dates_month | dates_day | dates_total_seconds | dates_weekday | dates_day_of_year | |
|---|---|---|---|---|---|---|
| 0 | 2023.0 | 1.0 | 3.0 | 1.672704e+09 | 2.0 | 3.0 |
| 1 | 2023.0 | 2.0 | 15.0 | 1.676419e+09 | 3.0 | 46.0 |
Cyclical patterns need special handling:
Or with DatetimeEncoder:
| dates_year | dates_total_seconds | dates_month_circular_0 | dates_month_circular_1 | dates_day_circular_0 | dates_day_circular_1 | |
|---|---|---|---|---|---|---|
| 0 | 2023.0 | 1.672704e+09 | 0.500000 | 0.866025 | 5.877853e-01 | 0.809017 |
| 1 | 2023.0 | 1.676419e+09 | 0.866025 | 0.500000 | 1.224647e-16 | -1.000000 |
| dates_year | dates_total_seconds | dates_month_spline_00 | dates_month_spline_01 | dates_month_spline_02 | dates_month_spline_03 | dates_month_spline_04 | dates_month_spline_05 | dates_month_spline_06 | dates_month_spline_07 | dates_month_spline_08 | dates_month_spline_09 | dates_month_spline_10 | dates_month_spline_11 | dates_day_spline_0 | dates_day_spline_1 | dates_day_spline_2 | dates_day_spline_3 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2023.0 | 1.672704e+09 | 0.0 | 0.166667 | 0.666667 | 0.166667 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.036000 | 0.538667 | 0.414667 | 0.010667 |
| 1 | 2023.0 | 1.676419e+09 | 0.0 | 0.000000 | 0.166667 | 0.666667 | 0.166667 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.166667 | 0.000000 | 0.166667 | 0.666667 |
example of periodic features generated with splines
ToDatetime or Cleaner to parse string datesDatetimeEncoder extracts useful features