skrub
.to_datetime#
- skrub.to_datetime(X, errors='coerce', **kwargs)[source]#
Convert the columns of a dataframe or 2d array into a datetime representation.
This function augments
pandas.to_datetime()
by supporting dataframes and 2d array inputs. It only attempts to convert columns whose dtype are object or string. Numeric columns are skip and preserved in the output.Use the ‘format’ keyword to force a specific datetime format. See more details in the parameters section.
- Parameters:
- XPandas or Polars dataframe, 2d-array or any input accepted by
pd.to_datetime
The object to convert to a datetime.
- errors{‘coerce’, ‘raise’}, default ‘coerce’
When set to ‘raise’, errors will be raised only when the following conditions are satisfied, for each column
X_col
: - After converting to numpy, the column dtype is np.object_ or np.str_ - Each entry of the column is datetime-parsable, i.e.pd.to_datetime(X_col, format="mixed")
doesn’t raise an error. This step is conservative, because e.g.["2020-01-01", "hello", "2020-01-01"]
is not considered datetime-parsable, so we won’t attempt to convert it).The column as a whole is not datetime-parsable, due to a clash of datetime format, e.g. ‘2020/01/01’ and ‘2020-01-01’.
When set to
'coerce'
, the entries ofX_col
that should have raised an error are set toNaT
instead. You can choose which format to use with the keyword argumentformat
, as withpd.to_datetime
, e.g.to_datetime(X_col, format='%Y/%m/%d')
. Combined witherror='coerce'
, this will convert all entries that don’t match this format toNaT
.Note that the
'ignore'
option is not used and will raise an error.- **kwargskey, value mappings
Other keyword arguments are passed down to
pandas.to_datetime()
.One notable argument is ‘format’. Setting a format overwrites the datetime format guessing behavior of this function for all columns.
Note that we don’t encourage you to use dayfirst or monthfirst argument, since their behavior is ambiguous and might not be applied at all.
Moreover, this function raises an error if ‘unit’ is set to any value. This is because, in
pandas.to_datetime
, ‘unit’ is specific to timestamps, whereas inskrub.to_datetime
we don’t attempt to parse numeric columns.
- XPandas or Polars dataframe, 2d-array or any input accepted by
- Returns:
- datetime
Return type depends on input. - dataframes, series and 2d arrays return the same type - otherwise return the same output as
pandas.to_datetime()
.
See also
pandas.to_datetime()
Convert argument to datetime.
Examples
>>> X = pd.DataFrame(dict(a=[1, 2], b=["2021-01-01", "2021-02-02"])) >>> X a b 0 1 2021-01-01 1 2 2021-02-02 >>> to_datetime(X) a b 0 1 2021-01-01 1 2 2021-02-02
Examples using skrub.to_datetime
#

Handling datetime features with the DatetimeEncoder