fetch_employee_salaries#
- skrub.datasets.fetch_employee_salaries(*, load_dataframe=True, drop_linked=True, drop_irrelevant=True, overload_job_titles=True, data_directory=None)[source]#
Fetches the employee salaries dataset (regression), available at https://openml.org/d/42125
- Description of the dataset:
Annual salary information including gross pay and overtime pay for all active, permanent employees of Montgomery County, MD paid in calendar year 2016. This information will be published annually each year.
- Parameters:
- drop_linked
bool
, default=True Drops columns “2016_gross_pay_received” and “2016_overtime_pay”, which are closely linked to “current_annual_salary”, the target.
- drop_irrelevant
bool
, default=True Drops column “full_name”, which is usually irrelevant to the statistical analysis.
- overload_job_titles
bool
, default=True Uses the column underfilled_job_title to enrich the employee_position_title column, as it contains more detailed information about the job title.
- data_directory: pathlib.Path or str, optional
The directory where the dataset is stored.
- drop_linked
- Returns:
DatasetAll
If load_dataframe=True
DatasetInfoOnly
If load_dataframe=False
Gallery examples#
Encoding: from a dataframe to a numerical matrix for machine learning