skrub.to_datetime#

skrub.to_datetime(X, errors='coerce', random_state=None, **kwargs)[source]#

Convert the columns of a dataframe or 2d array into a datetime representation.

This function augments pandas.to_datetime() by supporting dataframes and 2d array inputs. It only attempts to convert columns whose dtype are object or string. Numeric columns are skip and preserved in the output.

Use the ‘format’ keyword to force a specific datetime format. See more details in the parameters section.

Parameters:
XPandas or Polars dataframe, 2d-array or any input accepted by pd.to_datetime

The object to convert to a datetime.

errors{‘coerce’, ‘raise’}, default ‘coerce’

When set to ‘raise’, errors will be raised only when the following conditions are satisfied, for each column X_col: - After converting to numpy, the column dtype is np.object_ or np.str_ - Each entry of the column is datetime-parsable, i.e.

pd.to_datetime(X_col, format="mixed") doesn’t raise an error. This step is conservative, because e.g. ["2020-01-01", "hello", "2020-01-01"] is not considered datetime-parsable, so we won’t attempt to convert it).

  • The column as a whole is not datetime-parsable, due to a clash of datetime format, e.g. ‘2020/01/01’ and ‘2020-01-01’.

When set to 'coerce', the entries of X_col that should have raised an error are set to NaT instead. You can choose which format to use with the keyword argument format, as with pd.to_datetime, e.g. to_datetime(X_col, format='%Y/%m/%d'). Combined with error='coerce', this will convert all entries that don’t match this format to NaT.

Note that the 'ignore' option is not used and will raise an error.

random_stateint, RandomState instance or None, default=None

Determines random number generation for the subsampling during datetime guessing. Use an int to make the randomness deterministic.

**kwargskey, value mappings

Other keyword arguments are passed down to pandas.to_datetime().

One notable argument is ‘format’. Setting a format overwrites the datetime format guessing behavior of this function for all columns.

Note that we don’t encourage you to use dayfirst or monthfirst argument, since their behavior is ambiguous and might not be applied at all.

Moreover, this function raises an error if ‘unit’ is set to any value. This is because, in pandas.to_datetime, ‘unit’ is specific to timestamps, whereas in skrub.to_datetime we don’t attempt to parse numeric columns.

Returns:
datetime

Return type depends on input. - dataframes, series and 2d arrays return the same type - otherwise return the same output as pandas.to_datetime().

See also

pandas.to_datetime()

Convert argument to datetime.

Examples

>>> X = pd.DataFrame(dict(a=[1, 2], b=["2021-01-01", "2021-02-02"]))
>>> X
   a          b
0  1 2021-01-01
1  2 2021-02-02
>>> to_datetime(X)
   a          b
0  1 2021-01-01
1  2 2021-02-02

Examples using skrub.to_datetime#

Handling datetime features with the DatetimeEncoder

Handling datetime features with the DatetimeEncoder