Datasets#

Downloading a dataset.

datasets.fetch_bike_sharing

Fetch the bike sharing dataset (regression) available at skrub-data/skrub-data-files

datasets.fetch_country_happiness

Fetch the happiness index dataset (regression) available at skrub-data/skrub-data-files

datasets.fetch_credit_fraud

Fetch the credit fraud dataset (classification) available at skrub-data/skrub-data-files

datasets.fetch_drug_directory

Fetches the drug directory dataset (classification), available at skrub-data/skrub-data-files

datasets.fetch_employee_salaries

Fetches the employee salaries dataset (regression), available at skrub-data/skrub-data-files

datasets.fetch_flight_delays

Fetch the flight delays dataset (regression) available at skrub-data/skrub-data-files

datasets.fetch_ken_embeddings

Download Wikipedia embeddings by type.

datasets.fetch_ken_table_aliases

Get the supported aliases of embedded KEN entities tables.

datasets.fetch_ken_types

Helper function to search for KEN entity types.

datasets.fetch_medical_charge

Fetches the medical charge dataset (regression), available at skrub-data/skrub-data-files

datasets.fetch_midwest_survey

Fetches the midwest survey dataset (classification), available at skrub-data/skrub-data-files

datasets.fetch_movielens

Fetch the movielens dataset (regression) available at skrub-data/skrub-data-files

datasets.fetch_open_payments

Fetches the open payments dataset (classification), available at skrub-data/skrub-data-files

datasets.fetch_toxicity

Fetch the toxicity dataset (classification) available at skrub-data/skrub-data-files

datasets.fetch_traffic_violations

Fetches the traffic violations dataset (classification), available at skrub-data/skrub-data-files

datasets.fetch_videogame_sales

Fetch the videogame sales dataset (regression) available at skrub-data/skrub-data-files

datasets.get_data_dir

Returns the directory in which skrub looks for data.

datasets.make_deduplication_data

Duplicates examples with spelling mistakes.