skrub.datasets.fetch_employee_salaries#

skrub.datasets.fetch_employee_salaries(*, load_dataframe=True, drop_linked=True, drop_irrelevant=True, overload_job_titles=True, data_directory=None)[source]#

Fetches the employee salaries dataset (regression), available at https://openml.org/d/42125

Description of the dataset:

Annual salary information including gross pay and overtime pay for all active, permanent employees of Montgomery County, MD paid in calendar year 2016. This information will be published annually each year.

Parameters:
drop_linkedbool, default=True

Drops columns “2016_gross_pay_received” and “2016_overtime_pay”, which are closely linked to “current_annual_salary”, the target.

drop_irrelevantbool, default=True

Drops column “full_name”, which is usually irrelevant to the statistical analysis.

overload_job_titlesbool, default=True

Uses the column underfilled_job_title to enrich the employee_position_title column, as it contains more detailed information about the job title.

data_directory: pathlib.Path or str, optional

The directory where the dataset is stored.

Returns:
DatasetAll

If load_dataframe=True

DatasetInfoOnly

If load_dataframe=False

Examples using skrub.datasets.fetch_employee_salaries#

Encoding: from a dataframe to a numerical matrix for machine learning

Encoding: from a dataframe to a numerical matrix for machine learning

Feature interpretation with the GapEncoder

Feature interpretation with the GapEncoder