fetch_credit_fraud#
- skrub.datasets.fetch_credit_fraud(data_home=None, split='train')[source]#
Fetch the credit fraud dataset (classification).
Available at https://github.com/skrub-data/skrub-data-files
This is an imbalanced binary classification use-case. This dataset consists of two tables:
baskets, containing the binary fraud target label
products
Baskets contain at least one product each, so aggregation then joining operations are required to build a design matrix. Size on disk: 16MB.
- Parameters:
- Returns:
- bunch
Bunch A dictionary-like object with the following keys:
- basketsDataFrame of shape (92790, 2)
Table containing baskets ID and target.
- productsDataFrame of shape (163357, 7)
Table containing features about products contained in baskets
- metadatadict
A dictionary containing the name, description, source and target.
- baskets_pathstr
The path to the baskets CSV file.
- products_pathstr
The path to the products CSV file.
- bunch
Gallery examples#
Multiples tables: building machine learning pipelines with DataOps
Multiples tables: building machine learning pipelines with DataOps