.. |set_config| replace:: :func:`~skrub.set_config`
.. |config_context| replace:: :func:`~skrub.config_context`

.. _userguide_utils:

Example datasets, utilities, and customization
==============================================


Customizing the default configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Skrub includes a configuration manager that allows setting various parameters (see the |set_config| documentation for more detail).

It is possible to change configuration options using the |set_config| function:

.. code-block:: python

    from skrub import set_config
    set_config(use_table_report=True)

Each configuration parameter can also be modified by setting its environment variable.

A |config_context| is also provided, which allows temporarily altering the configuration:

.. code-block:: python

    import skrub
    with skrub.config_context(max_plot_columns=1):
        ...

Fetching the example datasets used in ``skrub``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

``skrub`` includes a number of datasets used for running examples. Each dataset can be downloaded using its ``fetch_*`` function, provided in the ``skrub.datasets`` namespace:

.. code-block:: python

    from skrub.datasets import fetch_employee_salaries
    data = fetch_employee_salaries()

Datasets are stored as :class:`~sklearn.utils.Bunch` objects, which include the full data, an ``X`` feature matrix, and a ``y`` target column with type ``pd.DataFrame``. Some datasets may have a different format depending on the use case.

Modifying the download location of ``skrub`` datasets
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

By default, datasets are stored in ``~/skrub_data``, where ``~`` is expanded as the (OS dependent) home directory of the user. The function ``get_data_dir`` shows the location that ``skrub`` uses to store data.

If needed, it is possible to change this location by modifying the environment variable ``SKRUB_DATA_DIRECTORY`` to an **absolute directory path**.