fetch_drug_directory#

skrub.datasets.fetch_drug_directory(data_home=None)[source]#

Fetches the drug directory dataset (classification), available at https://github.com/skrub-data/skrub-data-files

Description of the dataset:

Product listing data submitted to the U.S. FDA for all unfinished, unapproved drugs. Size on disk: 44MB.

Parameters:
data_homestr or path-like, default=None

The directory where to download and unzip the files.

Returns:
bunchBunch

A dictionary-like object with the following keys:

drug_directoryDataFrame of shape (120215, 21)

The dataframe.

XDataFrame of shape (120215, 20)

Features, i.e. the dataframe without the target labels.

yDataFrame of shape (120215, 1)

The target labels.

metadatadict

A dictionary containing the name, description, source and target.

pathstr

The path to the drug directory CSV file.