fetch_employee_salaries#
- skrub.datasets.fetch_employee_salaries(data_home=None, split='all')[source]#
Fetches the employee salaries dataset (regression), available at skrub-data/skrub-data-files
- Description of the dataset:
Annual salary information including gross pay and overtime pay for all active, permanent employees of Montgomery County, MD paid in calendar year 2016. This dataset is a copy of https://www.openml.org/d/42125 where some features are dropped to avoid data leaking.
Note
Some environments like Jupyterlite can run into networking issues when connecting to a remote server, but OpenML provides CORS headers. To download this dataset using OpenML instead of Github or Figshare, run:
from sklearn.datasets import fetch_openml df = fetch_openml(data_id=42125)
- Parameters:
- data_home: str or path, default=None
The directory where to download and unzip the files.
- split
str
, default=”all” The split to load. Can be either “train”, “test”, or “all”.
- Returns:
- bunchsklearn.utils.Bunch
A dictionary-like object with the following keys:
employee_salaries : pd.DataFrame, the dataframe
X : pd.DataFrame, features, i.e. the dataframe without the target labels
y : pd.DataFrame, target labels
metadata : a dictionary containing the name, description, source and target
Gallery examples#

Encoding: from a dataframe to a numerical matrix for machine learning
SquashingScaler: Robust numerical preprocessing for neural networks
Introduction to machine-learning pipelines with skrub DataOps