Working with the example datasets provided by skrub
-------------------------------------------------------

Skrub includes a number of datasets used for running examples. Each dataset
can be downloaded using its ``fetch_*`` function, provided in the ``skrub.datasets``
namespace:

.. code-block:: python

    from skrub.datasets import fetch_employee_salaries
    data = fetch_employee_salaries()

Datasets are stored as :class:`~sklearn.utils.Bunch` objects, which include a path
to each table in the dataset. Datasets should be loaded using the path:

.. code-block:: python

    import pandas as pd
    df = pd.read_csv(data.path)


Some datasets include multiple tables: in this case, ``path`` isn't available and
instead each table should be loaded with its own path:


.. code-block:: python

    from skrub.datasets import fetch_credit_fraud
    data = fetch_employee_salaries()
    baskets = pd.read_csv(data.baskets_path)
    products = pd.read_csv(data.products_path)


Modifying the download location of ``skrub`` datasets
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

By default, datasets are stored in ``~/skrub_data``, where ``~`` is expanded as
the (OS dependent) home directory of the user. The function
:func:`~skrub.datasets.get_data_dir` shows
the location that ``skrub`` uses to store data.

If needed, it is possible to change this location by modifying the environment
variable ``SKB_DATA_DIRECTORY`` to an **absolute directory path**.

See :ref:`user_guide_configuration_parameters` for more info on the global skrub
configuration.