fetch_employee_salaries#
- skrub.datasets.fetch_employee_salaries(data_home=None, split='all')[source]#
Fetches the employee salaries dataset (regression), available at https://github.com/skrub-data/skrub-data-files
- Description of the dataset:
Annual salary information including gross pay and overtime pay for all active, permanent employees of Montgomery County, MD paid in calendar year 2016. This dataset is a copy of https://www.openml.org/d/42125 where some features are dropped to avoid data leaking. Size on disk: 1.3MB.
Note
Some environments like Jupyterlite can run into networking issues when connecting to a remote server, but OpenML provides CORS headers. To download this dataset using OpenML instead of Github or Figshare, run:
from sklearn.datasets import fetch_openml df = fetch_openml(data_id=42125)
- Parameters:
- Returns:
- bunch
Bunch A dictionary-like object with the following keys:
- employee_salariesDataFrame of shape (9228, 8)
The dataframe.
- XDataFrame of shape (9228, 7)
Features, i.e. the dataframe without the target labels.
- yDataFrame of shape (9228, 1)
Target labels.
- metadatadict
A dictionary containing the name, description, source and target.
- pathstr
The path to the employee salaries CSV file.
- bunch
Gallery examples#
Tutorial: Using Data Ops to build a machine-learning pipeline
Encoding: from a dataframe to a numerical matrix for machine learning
SquashingScaler: Robust numerical preprocessing for neural networks