fetch_credit_fraud#

skrub.datasets.fetch_credit_fraud(load_dataframe=True, data_directory=None)[source]#

Fetch the credit fraud dataset from figshare.

This is an imbalanced binary classification use-case. This dataset consists in two tables:

  • baskets, containing the binary fraud target label

  • products

Baskets contain at least one product each, so aggregation then joining operations are required to build a design matrix.

More details on Figshare

Parameters:
load_dataframebool, default=True

Whether or not to load the dataset in memory after download.

data_directorystr, default=None

The directory to which the dataset will be written during the download. If None, the directory is set to ~/skrub_data.

Returns:
bunchsklearn.utils.Bunch

A dictionnary-like object, whose fields are: - product : pd.DataFrame - baskets : pd.DataFrame - source_product : str - source_baskets : str - path_product : str - path_baskets : str