fetch_employee_salaries#

skrub.datasets.fetch_employee_salaries(data_home=None, split='all')[source]#

Fetches the employee salaries dataset (regression), available at skrub-data/skrub-data-files

Description of the dataset:

Annual salary information including gross pay and overtime pay for all active, permanent employees of Montgomery County, MD paid in calendar year 2016. This dataset is a copy of https://www.openml.org/d/42125 where some features are dropped to avoid data leaking.

Note

Some environments like Jupyterlite can run into networking issues when connecting to a remote server, but OpenML provides CORS headers. To download this dataset using OpenML instead of Github or Figshare, run:

from sklearn.datasets import fetch_openml
df = fetch_openml(data_id=42125)
Parameters:
data_home: str or path, default=None

The directory where to download and unzip the files.

splitstr, default=”all”

The split to load. Can be either “train”, “test”, or “all”.

Returns:
bunchsklearn.utils.Bunch

A dictionary-like object with the following keys:

  • employee_salaries : pd.DataFrame, the dataframe

  • X : pd.DataFrame, features, i.e. the dataframe without the target labels

  • y : pd.DataFrame, target labels

  • metadata : a dictionary containing the name, description, source and target