Working with the example datasets provided by skrub#
skrub includes a number of datasets used for running examples. Each dataset
can be downloaded using its fetch_*
function, provided in the skrub.datasets
namespace:
from skrub.datasets import fetch_employee_salaries
data = fetch_employee_salaries()
Datasets are stored as Bunch
objects, which include the
full data, an X
feature matrix, and a y
target column with type pd.DataFrame
.
Some datasets may have a different format depending on the use case.
Modifying the download location of skrub
datasets#
By default, datasets are stored in ~/skrub_data
, where ~
is expanded as
the (OS dependent) home directory of the user. The function get_data_dir
shows
the location that skrub
uses to store data.
If needed, it is possible to change this location by modifying the environment
variable SKRUB_DATA_DIRECTORY
to an absolute directory path.
See Customizing the global configuration for more info on the global skrub configuration.