Example datasets, utilities, and customization#
Customizing the default configuration#
Skrub includes a configuration manager that allows setting various parameters (see the set_config()
documentation for more detail).
It is possible to change configuration options using the set_config()
function:
from skrub import set_config
set_config(use_table_report=True)
Each configuration parameter can also be modified by setting its environment variable.
A config_context()
is also provided, which allows temporarily altering the configuration:
import skrub
with skrub.config_context(max_plot_columns=1):
...
Fetching the example datasets used in skrub
#
skrub
includes a number of datasets used for running examples. Each dataset can be downloaded using its fetch_*
function, provided in the skrub.datasets
namespace:
from skrub.datasets import fetch_employee_salaries
data = fetch_employee_salaries()
Datasets are stored as Bunch
objects, which include the full data, an X
feature matrix, and a y
target column with type pd.DataFrame
. Some datasets may have a different format depending on the use case.
Modifying the download location of skrub
datasets#
By default, datasets are stored in ~/skrub_data
, where ~
is expanded as the (OS dependent) home directory of the user. The function get_data_dir
shows the location that skrub
uses to store data.
If needed, it is possible to change this location by modifying the environment variable SKRUB_DATA_DIRECTORY
to an absolute directory path.