Note
Go to the end to download the full example code. or to run this example in your browser via JupyterLite or Binder
Hands-On with Column Selection and Transformers#
In previous examples, we saw how skrub provides powerful abstractions like
TableVectorizer
and tabular_learner()
to create pipelines.
In this new example, we show how to create more flexible pipelines by selecting and transforming dataframe columns using arbitrary logic.
We begin with loading a dataset with heterogeneous datatypes, and replacing Pandas’s
display with the TableReport display via skrub.set_config()
.
import skrub
from skrub.datasets import fetch_employee_salaries
skrub.set_config(use_tablereport=True)
data = fetch_employee_salaries()
X, y = data.X, data.y
X
gender | department | department_name | division | assignment_category | employee_position_title | date_first_hired | year_first_hired | |
---|---|---|---|---|---|---|---|---|
0 | F | POL | Department of Police | MSB Information Mgmt and Tech Division Records Management Section | Fulltime-Regular | Office Services Coordinator | 09/22/1986 | 1,986 |
1 | M | POL | Department of Police | ISB Major Crimes Division Fugitive Section | Fulltime-Regular | Master Police Officer | 09/12/1988 | 1,988 |
2 | F | HHS | Department of Health and Human Services | Adult Protective and Case Management Services | Fulltime-Regular | Social Worker IV | 11/19/1989 | 1,989 |
3 | M | COR | Correction and Rehabilitation | PRRS Facility and Security | Fulltime-Regular | Resident Supervisor II | 05/05/2014 | 2,014 |
4 | M | HCA | Department of Housing and Community Affairs | Affordable Housing Programs | Fulltime-Regular | Planning Specialist III | 03/05/2007 | 2,007 |
9,223 | F | HHS | Department of Health and Human Services | School Based Health Centers | Fulltime-Regular | Community Health Nurse II | 11/03/2015 | 2,015 |
9,224 | F | FRS | Fire and Rescue Services | Human Resources Division | Fulltime-Regular | Fire/Rescue Division Chief | 11/28/1988 | 1,988 |
9,225 | M | HHS | Department of Health and Human Services | Child and Adolescent Mental Health Clinic Services | Parttime-Regular | Medical Doctor IV - Psychiatrist | 04/30/2001 | 2,001 |
9,226 | M | CCL | County Council | Council Central Staff | Fulltime-Regular | Manager II | 09/05/2006 | 2,006 |
9,227 | M | DLC | Department of Liquor Control | Licensure, Regulation and Education | Fulltime-Regular | Alcohol/Tobacco Enforcement Specialist II | 01/30/2012 | 2,012 |
gender
ObjectDType- Null values
- 17 (0.2%)
- Unique values
- 2 (< 0.1%)
Most frequent values
M
F
['M', 'F']
department
ObjectDType- Null values
- 0 (0.0%)
- Unique values
- 37 (0.4%)
Most frequent values
POL
HHS
FRS
DOT
COR
DLC
DGS
LIB
DPS
SHF
['POL', 'HHS', 'FRS', 'DOT', 'COR', 'DLC', 'DGS', 'LIB', 'DPS', 'SHF']
department_name
ObjectDType- Null values
- 0 (0.0%)
- Unique values
- 37 (0.4%)
Most frequent values
Department of Police
Department of Health and Human Services
Fire and Rescue Services
Department of Transportation
Correction and Rehabilitation
Department of Liquor Control
Department of General Services
Department of Public Libraries
Department of Permitting Services
Sheriff's Office
['Department of Police', 'Department of Health and Human Services', 'Fire and Rescue Services', 'Department of Transportation', 'Correction and Rehabilitation', 'Department of Liquor Control', 'Department of General Services', 'Department of Public Libraries', 'Department of Permitting Services', "Sheriff's Office"]
division
ObjectDType- Null values
- 0 (0.0%)
- Unique values
-
694 (7.5%)
This column has a high cardinality (> 40).
Most frequent values
School Health Services
Transit Silver Spring Ride On
Transit Gaithersburg Ride On
Highway Services
Child Welfare Services
FSB Traffic Division School Safety Section
Income Supports
PSB 3rd District Patrol
PSB 4th District Patrol
Transit Nicholson Ride On
['School Health Services', 'Transit Silver Spring Ride On', 'Transit Gaithersburg Ride On', 'Highway Services', 'Child Welfare Services', 'FSB Traffic Division School Safety Section', 'Income Supports', 'PSB 3rd District Patrol', 'PSB 4th District Patrol', 'Transit Nicholson Ride On']
assignment_category
ObjectDType- Null values
- 0 (0.0%)
- Unique values
- 2 (< 0.1%)
Most frequent values
Fulltime-Regular
Parttime-Regular
['Fulltime-Regular', 'Parttime-Regular']
employee_position_title
ObjectDType- Null values
- 0 (0.0%)
- Unique values
-
443 (4.8%)
This column has a high cardinality (> 40).
Most frequent values
Bus Operator
Police Officer III
Firefighter/Rescuer III
Manager III
Firefighter/Rescuer II
Master Firefighter/Rescuer
Office Services Coordinator
School Health Room Technician I
Police Officer II
Community Health Nurse II
['Bus Operator', 'Police Officer III', 'Firefighter/Rescuer III', 'Manager III', 'Firefighter/Rescuer II', 'Master Firefighter/Rescuer', 'Office Services Coordinator', 'School Health Room Technician I', 'Police Officer II', 'Community Health Nurse II']
date_first_hired
ObjectDType- Null values
- 0 (0.0%)
- Unique values
-
2,264 (24.5%)
This column has a high cardinality (> 40).
Most frequent values
12/12/2016
01/14/2013
02/24/2014
03/10/2014
08/12/2013
10/06/2014
09/22/2014
03/19/2007
07/16/2012
07/29/2013
['12/12/2016', '01/14/2013', '02/24/2014', '03/10/2014', '08/12/2013', '10/06/2014', '09/22/2014', '03/19/2007', '07/16/2012', '07/29/2013']
year_first_hired
Int64DType- Null values
- 0 (0.0%)
- Unique values
-
51 (0.6%)
This column has a high cardinality (> 40).
- Mean ± Std
- 2.00e+03 ± 9.33
- Median ± IQR
- 2,005 ± 14
- Min | Max
- 1,965 | 2,016
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
Column
|
Column name
|
dtype
|
Is sorted
|
Null values
|
Unique values
|
Mean
|
Std
|
Min
|
Median
|
Max
|
---|---|---|---|---|---|---|---|---|---|---|
0 | gender | ObjectDType | False | 17 (0.2%) | 2 (< 0.1%) | |||||
1 | department | ObjectDType | False | 0 (0.0%) | 37 (0.4%) | |||||
2 | department_name | ObjectDType | False | 0 (0.0%) | 37 (0.4%) | |||||
3 | division | ObjectDType | False | 0 (0.0%) | 694 (7.5%) | |||||
4 | assignment_category | ObjectDType | False | 0 (0.0%) | 2 (< 0.1%) | |||||
5 | employee_position_title | ObjectDType | False | 0 (0.0%) | 443 (4.8%) | |||||
6 | date_first_hired | ObjectDType | False | 0 (0.0%) | 2264 (24.5%) | |||||
7 | year_first_hired | Int64DType | False | 0 (0.0%) | 51 (0.6%) | 2.00e+03 | 9.33 | 1,965 | 2,005 | 2,016 |
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
gender
ObjectDType- Null values
- 17 (0.2%)
- Unique values
- 2 (< 0.1%)
Most frequent values
M
F
['M', 'F']
department
ObjectDType- Null values
- 0 (0.0%)
- Unique values
- 37 (0.4%)
Most frequent values
POL
HHS
FRS
DOT
COR
DLC
DGS
LIB
DPS
SHF
['POL', 'HHS', 'FRS', 'DOT', 'COR', 'DLC', 'DGS', 'LIB', 'DPS', 'SHF']
department_name
ObjectDType- Null values
- 0 (0.0%)
- Unique values
- 37 (0.4%)
Most frequent values
Department of Police
Department of Health and Human Services
Fire and Rescue Services
Department of Transportation
Correction and Rehabilitation
Department of Liquor Control
Department of General Services
Department of Public Libraries
Department of Permitting Services
Sheriff's Office
['Department of Police', 'Department of Health and Human Services', 'Fire and Rescue Services', 'Department of Transportation', 'Correction and Rehabilitation', 'Department of Liquor Control', 'Department of General Services', 'Department of Public Libraries', 'Department of Permitting Services', "Sheriff's Office"]
division
ObjectDType- Null values
- 0 (0.0%)
- Unique values
-
694 (7.5%)
This column has a high cardinality (> 40).
Most frequent values
School Health Services
Transit Silver Spring Ride On
Transit Gaithersburg Ride On
Highway Services
Child Welfare Services
FSB Traffic Division School Safety Section
Income Supports
PSB 3rd District Patrol
PSB 4th District Patrol
Transit Nicholson Ride On
['School Health Services', 'Transit Silver Spring Ride On', 'Transit Gaithersburg Ride On', 'Highway Services', 'Child Welfare Services', 'FSB Traffic Division School Safety Section', 'Income Supports', 'PSB 3rd District Patrol', 'PSB 4th District Patrol', 'Transit Nicholson Ride On']
assignment_category
ObjectDType- Null values
- 0 (0.0%)
- Unique values
- 2 (< 0.1%)
Most frequent values
Fulltime-Regular
Parttime-Regular
['Fulltime-Regular', 'Parttime-Regular']
employee_position_title
ObjectDType- Null values
- 0 (0.0%)
- Unique values
-
443 (4.8%)
This column has a high cardinality (> 40).
Most frequent values
Bus Operator
Police Officer III
Firefighter/Rescuer III
Manager III
Firefighter/Rescuer II
Master Firefighter/Rescuer
Office Services Coordinator
School Health Room Technician I
Police Officer II
Community Health Nurse II
['Bus Operator', 'Police Officer III', 'Firefighter/Rescuer III', 'Manager III', 'Firefighter/Rescuer II', 'Master Firefighter/Rescuer', 'Office Services Coordinator', 'School Health Room Technician I', 'Police Officer II', 'Community Health Nurse II']
date_first_hired
ObjectDType- Null values
- 0 (0.0%)
- Unique values
-
2,264 (24.5%)
This column has a high cardinality (> 40).
Most frequent values
12/12/2016
01/14/2013
02/24/2014
03/10/2014
08/12/2013
10/06/2014
09/22/2014
03/19/2007
07/16/2012
07/29/2013
['12/12/2016', '01/14/2013', '02/24/2014', '03/10/2014', '08/12/2013', '10/06/2014', '09/22/2014', '03/19/2007', '07/16/2012', '07/29/2013']
year_first_hired
Int64DType- Null values
- 0 (0.0%)
- Unique values
-
51 (0.6%)
This column has a high cardinality (> 40).
- Mean ± Std
- 2.00e+03 ± 9.33
- Median ± IQR
- 2,005 ± 14
- Min | Max
- 1,965 | 2,016
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
Column 1 | Column 2 | Cramér's V | Pearson's Correlation |
---|---|---|---|
department | department_name | 1.00 | |
assignment_category | employee_position_title | 0.636 | |
division | assignment_category | 0.574 | |
division | employee_position_title | 0.522 | |
department_name | assignment_category | 0.424 | |
department | assignment_category | 0.424 | |
department_name | employee_position_title | 0.415 | |
department | employee_position_title | 0.415 | |
department | division | 0.366 | |
department_name | division | 0.366 | |
gender | department | 0.363 | |
gender | department_name | 0.363 | |
gender | employee_position_title | 0.262 | |
gender | division | 0.255 | |
gender | assignment_category | 0.236 | |
employee_position_title | date_first_hired | 0.206 | |
department | date_first_hired | 0.144 | |
department_name | date_first_hired | 0.144 | |
date_first_hired | year_first_hired | 0.136 | |
employee_position_title | year_first_hired | 0.124 |
Please enable javascript
The skrub table reports need javascript to display correctly. If you are displaying a report in a Jupyter notebook and you see this message, you may need to re-execute the cell or to trust the notebook (button on the top right or "File > Trust notebook").
Our goal is now to apply a StringEncoder
to two columns of our
choosing: division
and employee_position_title
.
We can achieve this using ApplyToCols
, whose job is to apply a
transformer to multiple columns independently, and let unmatched columns through
without changes.
This can be seen as a handy drop-in replacement of the
ColumnTransformer
.
Since we selected two columns and set the number of components to 30
each,
ApplyToCols
will create 2*30
embedding columns in the dataframe
Xt
, which we prefix with lsa_
.
from skrub import ApplyToCols, StringEncoder
apply_string_encoder = ApplyToCols(
StringEncoder(n_components=30),
cols=["division", "employee_position_title"],
rename_columns="lsa_{}",
)
Xt = apply_string_encoder.fit_transform(X)
Xt
gender | department | department_name | lsa_division_00 | lsa_division_01 | lsa_division_02 | lsa_division_03 | lsa_division_04 | lsa_division_05 | lsa_division_06 | lsa_division_07 | lsa_division_08 | lsa_division_09 | lsa_division_10 | lsa_division_11 | lsa_division_12 | lsa_division_13 | lsa_division_14 | lsa_division_15 | lsa_division_16 | lsa_division_17 | lsa_division_18 | lsa_division_19 | lsa_division_20 | lsa_division_21 | lsa_division_22 | lsa_division_23 | lsa_division_24 | lsa_division_25 | lsa_division_26 | lsa_division_27 | lsa_division_28 | lsa_division_29 | assignment_category | lsa_employee_position_title_00 | lsa_employee_position_title_01 | lsa_employee_position_title_02 | lsa_employee_position_title_03 | lsa_employee_position_title_04 | lsa_employee_position_title_05 | lsa_employee_position_title_06 | lsa_employee_position_title_07 | lsa_employee_position_title_08 | lsa_employee_position_title_09 | lsa_employee_position_title_10 | lsa_employee_position_title_11 | lsa_employee_position_title_12 | lsa_employee_position_title_13 | lsa_employee_position_title_14 | lsa_employee_position_title_15 | lsa_employee_position_title_16 | lsa_employee_position_title_17 | lsa_employee_position_title_18 | lsa_employee_position_title_19 | lsa_employee_position_title_20 | lsa_employee_position_title_21 | lsa_employee_position_title_22 | lsa_employee_position_title_23 | lsa_employee_position_title_24 | lsa_employee_position_title_25 | lsa_employee_position_title_26 | lsa_employee_position_title_27 | lsa_employee_position_title_28 | lsa_employee_position_title_29 | date_first_hired | year_first_hired | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | F | POL | Department of Police | 0.218 | 0.353 | -0.0416 | -0.0900 | -0.446 | -0.238 | -0.185 | -0.0513 | -0.229 | -0.0690 | -0.0256 | -0.125 | -0.0818 | 0.0489 | 0.0791 | 0.118 | 0.0218 | -0.233 | -0.196 | 0.127 | -0.357 | 0.0590 | -0.188 | -0.151 | 0.154 | 0.00892 | 0.0309 | -0.00709 | -0.0138 | -0.0284 | Fulltime-Regular | 0.398 | -0.146 | 0.180 | -0.0653 | 0.0948 | 0.0932 | 0.782 | 0.305 | -0.285 | -0.127 | -0.119 | -0.416 | -0.0521 | -0.172 | 0.203 | 0.00908 | -0.118 | -0.0229 | 0.0898 | -0.116 | -0.0294 | 0.0977 | -0.0359 | -0.0151 | 0.0202 | -0.0287 | 0.0448 | -0.0961 | -0.0935 | -0.0314 | 09/22/1986 | 1,986 |
1 | M | POL | Department of Police | 0.163 | 0.232 | -0.0295 | -0.0615 | -0.383 | -0.0161 | -0.0946 | -0.0577 | 0.108 | -0.0467 | 0.0327 | -0.129 | -0.0130 | 0.0292 | -0.102 | -0.0442 | -0.0381 | -0.0542 | -0.274 | 0.253 | 0.143 | -0.121 | -0.00751 | 0.00929 | -0.0352 | -0.0179 | -0.0636 | 0.0784 | 0.115 | -0.0271 | Fulltime-Regular | 0.847 | -0.118 | -0.0486 | -0.111 | -0.0529 | -0.0486 | -0.108 | 0.00810 | -0.0236 | -0.0552 | -0.0566 | -0.00862 | 0.0560 | -0.0586 | -0.0214 | -0.00299 | -0.00648 | -0.0784 | -0.148 | -0.0203 | -0.105 | 0.169 | -0.00665 | -0.0712 | -0.0948 | 0.661 | 0.197 | 0.0736 | 0.0771 | -0.0965 | 09/12/1988 | 1,988 |
2 | F | HHS | Department of Health and Human Services | 0.132 | 0.255 | 0.350 | -0.0107 | -0.0790 | -0.404 | -0.152 | -0.0184 | -0.423 | -0.0727 | -0.0265 | -0.0904 | 0.0101 | 0.0805 | -0.0175 | 0.0314 | 0.0654 | -0.278 | -0.138 | -0.00624 | -0.0400 | 0.0795 | -0.0929 | -0.0276 | -0.392 | -0.0242 | -0.110 | 0.230 | -0.308 | 0.176 | Fulltime-Regular | 0.0480 | 0.0160 | 0.00704 | 0.0872 | 0.120 | 0.0590 | 0.213 | -0.235 | 0.375 | 0.732 | -0.460 | 0.0477 | 0.0600 | 0.0295 | 0.176 | 0.0317 | 0.0129 | -0.0176 | -0.0549 | -0.0607 | -0.0868 | 0.0282 | 0.0295 | 0.00623 | -0.00513 | 0.0611 | 0.0343 | -0.0432 | 0.000456 | -0.116 | 11/19/1989 | 1,989 |
3 | M | COR | Correction and Rehabilitation | 0.0582 | 0.0880 | 0.0638 | 0.000582 | -0.295 | -0.130 | 0.575 | 0.0640 | -0.0902 | 0.0136 | -0.00553 | 0.0259 | 0.0388 | 0.426 | -0.160 | -0.174 | 0.0341 | 0.0223 | -0.00818 | 0.0132 | -0.154 | -0.0423 | 0.176 | 0.0540 | 0.0194 | 0.0933 | -0.0767 | -0.0684 | -0.0473 | -0.0136 | Fulltime-Regular | 0.0461 | 0.0237 | 0.0694 | 0.0389 | 0.0573 | 0.0490 | 0.131 | 0.0620 | 0.0208 | 0.0182 | 0.0629 | -0.0171 | -0.0607 | -0.00768 | -0.0256 | -0.0199 | 0.116 | 0.0469 | -0.0893 | 0.130 | 0.153 | -0.0125 | 0.0455 | 0.177 | -0.00258 | 0.0753 | -0.234 | -0.313 | 0.596 | -0.0208 | 05/05/2014 | 2,014 |
4 | M | HCA | Department of Housing and Community Affairs | 0.0146 | 0.0259 | -0.00150 | 0.0268 | -0.0367 | -0.0312 | -0.0198 | 0.0507 | -0.0216 | 0.0109 | -0.000848 | 0.0265 | 0.0219 | 0.104 | -0.0258 | 0.0475 | 0.0327 | -0.0842 | 0.00948 | -0.0736 | 0.0647 | 0.0507 | -0.0903 | -0.0405 | 0.0294 | 0.0448 | -0.101 | 0.0645 | -0.0219 | -0.00314 | Fulltime-Regular | 0.0903 | 0.0242 | 0.0259 | 0.243 | 0.390 | -0.0631 | -0.0142 | -0.146 | -0.0338 | 0.0444 | 0.00444 | -0.0805 | 0.0182 | 0.122 | -0.191 | 0.0624 | -0.0626 | 0.0512 | 0.0720 | -0.0915 | 0.0424 | 0.0402 | -0.256 | -0.0449 | 0.0140 | -0.140 | 0.161 | 0.153 | 0.188 | -0.0177 | 03/05/2007 | 2,007 |
9,223 | F | HHS | Department of Health and Human Services | 0.147 | 0.199 | 0.370 | 0.0113 | 0.00129 | 0.550 | -0.0127 | 0.0527 | -0.210 | 0.0391 | -0.0799 | 0.225 | -0.0198 | 0.0516 | 0.0463 | 0.00923 | -0.0485 | -0.120 | 0.0460 | -0.137 | 0.0370 | -0.291 | -0.148 | -0.0949 | 0.0929 | 0.0732 | -0.151 | 0.0434 | 0.205 | -0.125 | Fulltime-Regular | 0.0500 | 0.0114 | 0.0106 | 0.0892 | 0.122 | 0.520 | 0.271 | 0.0834 | -0.199 | -0.111 | -0.00621 | 0.815 | 0.237 | 0.187 | -0.160 | -0.0112 | -0.129 | 0.0219 | -0.00218 | -0.0852 | -0.177 | -0.0205 | 0.0970 | 0.0867 | 0.0496 | -0.00959 | 0.211 | -0.177 | -0.0111 | 0.0371 | 11/03/2015 | 2,015 |
9,224 | F | FRS | Fire and Rescue Services | 0.105 | 0.157 | 0.0427 | -0.0275 | -0.237 | -0.0218 | -0.0836 | -0.0329 | 0.0707 | -0.0263 | 0.0239 | -0.0619 | -0.0814 | 0.0185 | 0.0187 | 0.0376 | -0.0322 | 0.00880 | -0.0794 | 0.0852 | -0.0293 | 0.0465 | 0.103 | -0.0479 | -0.0460 | -0.0249 | -0.0275 | 0.0445 | 0.121 | -0.0451 | Fulltime-Regular | 0.0622 | 0.229 | 0.00245 | -0.0313 | 0.0356 | -0.00290 | 0.00727 | 0.0515 | 0.0472 | -0.00977 | 0.0784 | -0.00967 | -0.131 | 0.252 | 0.152 | 0.0309 | -0.0990 | -0.0161 | -0.00180 | 0.000270 | 0.0175 | 0.00449 | 0.0402 | 0.0229 | -0.0237 | 0.0108 | -0.0371 | -0.0305 | 0.0250 | 0.00836 | 11/28/1988 | 1,988 |
9,225 | M | HHS | Department of Health and Human Services | 0.142 | 0.253 | 0.431 | -0.00543 | 0.0660 | -0.0169 | -0.0222 | 0.0217 | -0.112 | 0.0670 | 0.233 | 0.136 | 0.0419 | 0.0894 | 0.0878 | 0.0542 | 0.0235 | -0.178 | -0.0476 | -0.0382 | 0.00880 | -0.0414 | 0.0137 | -0.00900 | -0.0620 | 0.00990 | -0.101 | -0.0546 | 0.151 | -0.0395 | Parttime-Regular | 0.00774 | -5.37e-05 | 0.0428 | 0.0203 | 0.0429 | 0.00315 | 0.0190 | -0.00684 | 0.0168 | 0.0226 | 0.00579 | -0.0132 | 0.00341 | 0.00496 | -0.00566 | 0.000364 | -0.00199 | 0.0253 | -0.00151 | 0.00521 | 0.0197 | -0.0122 | -0.0160 | 0.00705 | 0.00805 | 0.0203 | 0.00820 | -0.0232 | -0.0298 | -0.0207 | 04/30/2001 | 2,001 |
9,226 | M | CCL | County Council | 0.0668 | 0.137 | -0.0695 | -0.0385 | 0.00973 | -0.00640 | 0.0185 | 0.00934 | -0.0521 | 0.0368 | 0.00268 | 0.0847 | -0.0996 | 0.220 | 0.0583 | 0.0165 | 0.0490 | -0.353 | 0.0989 | -0.409 | 0.342 | -0.495 | 0.393 | -0.682 | 0.0774 | -0.330 | -0.0220 | -0.190 | -0.219 | 0.00177 | Fulltime-Regular | 0.201 | 0.112 | 0.0314 | 0.881 | -0.611 | 0.117 | 0.0993 | -0.0987 | 0.0928 | -0.0567 | 0.0543 | -0.0508 | -0.0906 | 0.0323 | -0.102 | -0.0345 | -0.0376 | -0.0373 | -0.0343 | -0.0783 | 0.0572 | 0.0390 | -0.0415 | 0.0252 | 0.00387 | 0.0497 | 0.0719 | -0.0217 | -0.0215 | 0.0648 | 09/05/2006 | 2,006 |
9,227 | M | DLC | Department of Liquor Control | 0.122 | 0.232 | -0.105 | -0.0828 | -0.0926 | -0.0381 | -0.0450 | -0.00404 | 0.0190 | 0.0154 | 0.00624 | 0.116 | 0.0128 | 0.102 | 0.0633 | 0.108 | 0.0393 | -0.0600 | -0.0202 | 0.0544 | -0.0670 | -0.0135 | 0.0908 | 0.0442 | 0.0108 | 0.0721 | -0.0125 | -0.0498 | 0.0544 | -0.0312 | Fulltime-Regular | 0.0339 | 0.0105 | 0.0290 | 0.137 | 0.230 | -0.0199 | -0.00293 | -0.0603 | -0.0148 | 0.0251 | 0.0193 | -0.0732 | -0.00263 | 0.0958 | -0.129 | -0.00914 | -0.0181 | 0.242 | -0.0296 | 0.0180 | -0.0283 | -0.0514 | -0.112 | -0.0115 | 0.00493 | 0.0328 | 0.0328 | -0.0576 | -0.0492 | -0.0291 | 01/30/2012 | 2,012 |
gender
ObjectDType- Null values
- 17 (0.2%)
- Unique values
- 2 (< 0.1%)
department
ObjectDType- Null values
- 0 (0.0%)
- Unique values
- 37 (0.4%)
department_name
ObjectDType- Null values
- 0 (0.0%)
- Unique values
- 37 (0.4%)
lsa_division_00
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
685 (7.4%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.223 ± 0.281
- Median ± IQR
- 0.134 ± 0.209
- Min | Max
- 8.44e-05 | 1.13
lsa_division_01
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
686 (7.4%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.179 ± 0.280
- Median ± IQR
- 0.177 ± 0.315
- Min | Max
- -0.551 | 0.784
lsa_division_02
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
685 (7.4%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.0284 ± 0.314
- Median ± IQR
- -0.00738 ± 0.170
- Min | Max
- -0.615 | 0.977
lsa_division_03
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
685 (7.4%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.0428 ± 0.305
- Median ± IQR
- -0.00381 ± 0.0684
- Min | Max
- -0.443 | 1.10
lsa_division_04
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
692 (7.5%)
This column has a high cardinality (> 40).
- Mean ± Std
- -0.0379 ± 0.233
- Median ± IQR
- -0.00801 ± 0.201
- Min | Max
- -0.774 | 0.362
lsa_division_05
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
694 (7.5%)
This column has a high cardinality (> 40).
- Mean ± Std
- -0.0430 ± 0.218
- Median ± IQR
- -0.0435 ± 0.113
- Min | Max
- -0.678 | 0.691
lsa_division_06
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
694 (7.5%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.0115 ± 0.212
- Median ± IQR
- -0.00443 ± 0.110
- Min | Max
- -0.338 | 1.20
lsa_division_07
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
694 (7.5%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.00129 ± 0.206
- Median ± IQR
- -0.00478 ± 0.0443
- Min | Max
- -0.822 | 0.942
lsa_division_08
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
694 (7.5%)
This column has a high cardinality (> 40).
- Mean ± Std
- -0.0148 ± 0.199
- Median ± IQR
- -0.0126 ± 0.0876
- Min | Max
- -0.826 | 0.674
lsa_division_09
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
694 (7.5%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.0293 ± 0.188
- Median ± IQR
- 0.00180 ± 0.0313
- Min | Max
- -0.178 | 1.31
lsa_division_10
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
694 (7.5%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.00125 ± 0.185
- Median ± IQR
- 0.00335 ± 0.0317
- Min | Max
- -0.551 | 1.07
lsa_division_11
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
694 (7.5%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.0192 ± 0.175
- Median ± IQR
- 0.00629 ± 0.119
- Min | Max
- -0.637 | 0.612
lsa_division_12
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
694 (7.5%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.0127 ± 0.169
- Median ± IQR
- -0.00487 ± 0.0726
- Min | Max
- -0.626 | 0.919
lsa_division_13
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
694 (7.5%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.0291 ± 0.157
- Median ± IQR
- -0.000239 ± 0.0791
- Min | Max
- -0.398 | 1.01
lsa_division_14
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
694 (7.5%)
This column has a high cardinality (> 40).
- Mean ± Std
- -0.0125 ± 0.154
- Median ± IQR
- -0.00473 ± 0.0762
- Min | Max
- -0.503 | 0.559
lsa_division_15
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
694 (7.5%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.0101 ± 0.151
- Median ± IQR
- 0.00735 ± 0.0571
- Min | Max
- -0.580 | 0.753
lsa_division_16
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
694 (7.5%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.00132 ± 0.148
- Median ± IQR
- -0.00243 ± 0.0878
- Min | Max
- -0.525 | 0.554
lsa_division_17
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
694 (7.5%)
This column has a high cardinality (> 40).
- Mean ± Std
- -0.0203 ± 0.140
- Median ± IQR
- 0.000983 ± 0.115
- Min | Max
- -0.428 | 0.491
lsa_division_18
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
694 (7.5%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.0154 ± 0.140
- Median ± IQR
- 0.00737 ± 0.0926
- Min | Max
- -0.610 | 0.509
lsa_division_19
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
694 (7.5%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.00203 ± 0.140
- Median ± IQR
- 0.00208 ± 0.0775
- Min | Max
- -0.409 | 0.555
lsa_division_20
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
694 (7.5%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.00868 ± 0.132
- Median ± IQR
- -0.00165 ± 0.0668
- Min | Max
- -0.659 | 0.557
lsa_division_21
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
694 (7.5%)
This column has a high cardinality (> 40).
- Mean ± Std
- -0.00636 ± 0.131
- Median ± IQR
- 0.000414 ± 0.0469
- Min | Max
- -0.495 | 0.823
lsa_division_22
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
694 (7.5%)
This column has a high cardinality (> 40).
- Mean ± Std
- -0.00681 ± 0.127
- Median ± IQR
- -0.00207 ± 0.0985
- Min | Max
- -0.446 | 0.541
lsa_division_23
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
694 (7.5%)
This column has a high cardinality (> 40).
- Mean ± Std
- -0.00277 ± 0.121
- Median ± IQR
- 0.00155 ± 0.0708
- Min | Max
- -0.682 | 0.507
lsa_division_24
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
694 (7.5%)
This column has a high cardinality (> 40).
- Mean ± Std
- -0.00201 ± 0.119
- Median ± IQR
- -0.000608 ± 0.0520
- Min | Max
- -0.589 | 0.725
lsa_division_25
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
694 (7.5%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.00360 ± 0.116
- Median ± IQR
- 0.00597 ± 0.0411
- Min | Max
- -0.424 | 0.675
lsa_division_26
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
694 (7.5%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.000647 ± 0.115
- Median ± IQR
- -0.00659 ± 0.0674
- Min | Max
- -0.340 | 0.644
lsa_division_27
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
694 (7.5%)
This column has a high cardinality (> 40).
- Mean ± Std
- -0.00787 ± 0.113
- Median ± IQR
- -0.000620 ± 0.0545
- Min | Max
- -0.501 | 0.456
lsa_division_28
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
694 (7.5%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.00389 ± 0.109
- Median ± IQR
- -0.00444 ± 0.0605
- Min | Max
- -0.393 | 0.537
lsa_division_29
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
694 (7.5%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.00152 ± 0.107
- Median ± IQR
- -0.00199 ± 0.0716
- Min | Max
- -0.353 | 0.563
assignment_category
ObjectDType- Null values
- 0 (0.0%)
- Unique values
- 2 (< 0.1%)
lsa_employee_position_title_00
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
443 (4.8%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.230 ± 0.315
- Median ± IQR
- 0.0888 ± 0.274
- Min | Max
- 0.000534 | 1.11
lsa_employee_position_title_01
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
443 (4.8%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.0831 ± 0.338
- Median ± IQR
- 0.00643 ± 0.0530
- Min | Max
- -0.342 | 1.09
lsa_employee_position_title_02
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
443 (4.8%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.110 ± 0.301
- Median ± IQR
- 0.0156 ± 0.0515
- Min | Max
- -0.0618 | 1.16
lsa_employee_position_title_03
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
443 (4.8%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.108 ± 0.275
- Median ± IQR
- 0.0336 ± 0.197
- Min | Max
- -0.164 | 0.984
lsa_employee_position_title_04
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
443 (4.8%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.0766 ± 0.242
- Median ± IQR
- 0.0431 ± 0.156
- Min | Max
- -0.617 | 0.658
lsa_employee_position_title_05
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
443 (4.8%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.0532 ± 0.208
- Median ± IQR
- 0.000349 ± 0.113
- Min | Max
- -0.315 | 0.954
lsa_employee_position_title_06
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
443 (4.8%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.0332 ± 0.192
- Median ± IQR
- 0.00455 ± 0.168
- Min | Max
- -0.323 | 0.782
lsa_employee_position_title_07
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
443 (4.8%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.0123 ± 0.190
- Median ± IQR
- -0.00157 ± 0.136
- Min | Max
- -0.547 | 0.640
lsa_employee_position_title_08
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
443 (4.8%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.0359 ± 0.178
- Median ± IQR
- 0.00362 ± 0.135
- Min | Max
- -0.312 | 0.590
lsa_employee_position_title_09
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
443 (4.8%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.00945 ± 0.181
- Median ± IQR
- -0.00301 ± 0.0589
- Min | Max
- -0.620 | 0.812
lsa_employee_position_title_10
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
443 (4.8%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.0198 ± 0.175
- Median ± IQR
- -0.00949 ± 0.0974
- Min | Max
- -0.503 | 0.730
lsa_employee_position_title_11
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
443 (4.8%)
This column has a high cardinality (> 40).
- Mean ± Std
- -0.000953 ± 0.173
- Median ± IQR
- 0.0114 ± 0.0823
- Min | Max
- -0.458 | 0.815
lsa_employee_position_title_12
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
443 (4.8%)
This column has a high cardinality (> 40).
- Mean ± Std
- -0.00365 ± 0.168
- Median ± IQR
- -0.00364 ± 0.0898
- Min | Max
- -0.332 | 0.961
lsa_employee_position_title_13
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
443 (4.8%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.0136 ± 0.165
- Median ± IQR
- 0.00563 ± 0.123
- Min | Max
- -0.331 | 0.649
lsa_employee_position_title_14
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
443 (4.8%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.00150 ± 0.161
- Median ± IQR
- -0.0214 ± 0.194
- Min | Max
- -0.375 | 0.465
lsa_employee_position_title_15
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
443 (4.8%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.0191 ± 0.155
- Median ± IQR
- 0.000372 ± 0.0467
- Min | Max
- -0.203 | 1.10
lsa_employee_position_title_16
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
443 (4.8%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.0113 ± 0.147
- Median ± IQR
- -3.13e-05 ± 0.104
- Min | Max
- -0.311 | 0.839
lsa_employee_position_title_17
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
443 (4.8%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.00432 ± 0.140
- Median ± IQR
- -0.00160 ± 0.0777
- Min | Max
- -0.260 | 0.757
lsa_employee_position_title_18
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
443 (4.8%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.00522 ± 0.137
- Median ± IQR
- 0.00169 ± 0.0795
- Min | Max
- -0.377 | 1.00
lsa_employee_position_title_19
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
443 (4.8%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.0104 ± 0.133
- Median ± IQR
- -0.00217 ± 0.111
- Min | Max
- -0.343 | 0.467
lsa_employee_position_title_20
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
443 (4.8%)
This column has a high cardinality (> 40).
- Mean ± Std
- -0.00302 ± 0.130
- Median ± IQR
- -0.00924 ± 0.0914
- Min | Max
- -0.436 | 0.486
lsa_employee_position_title_21
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
443 (4.8%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.00397 ± 0.128
- Median ± IQR
- -0.0141 ± 0.125
- Min | Max
- -0.537 | 0.553
lsa_employee_position_title_22
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
443 (4.8%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.00135 ± 0.126
- Median ± IQR
- -0.000470 ± 0.0877
- Min | Max
- -0.336 | 0.500
lsa_employee_position_title_23
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
443 (4.8%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.00771 ± 0.122
- Median ± IQR
- -0.00613 ± 0.0500
- Min | Max
- -0.328 | 0.945
lsa_employee_position_title_24
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
443 (4.8%)
This column has a high cardinality (> 40).
- Mean ± Std
- -0.00128 ± 0.122
- Median ± IQR
- 0.00407 ± 0.0377
- Min | Max
- -0.589 | 0.689
lsa_employee_position_title_25
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
443 (4.8%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.00925 ± 0.119
- Median ± IQR
- 0.00535 ± 0.106
- Min | Max
- -0.189 | 0.661
lsa_employee_position_title_26
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
443 (4.8%)
This column has a high cardinality (> 40).
- Mean ± Std
- -0.000972 ± 0.114
- Median ± IQR
- 0.00150 ± 0.0882
- Min | Max
- -0.382 | 0.491
lsa_employee_position_title_27
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
443 (4.8%)
This column has a high cardinality (> 40).
- Mean ± Std
- -0.00432 ± 0.108
- Median ± IQR
- -0.00204 ± 0.0534
- Min | Max
- -0.474 | 0.410
lsa_employee_position_title_28
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
443 (4.8%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.00149 ± 0.104
- Median ± IQR
- -0.00324 ± 0.0734
- Min | Max
- -0.292 | 0.625
lsa_employee_position_title_29
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
443 (4.8%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.00529 ± 0.0975
- Median ± IQR
- -0.0108 ± 0.0599
- Min | Max
- -0.174 | 0.859
date_first_hired
ObjectDType- Null values
- 0 (0.0%)
- Unique values
-
2,264 (24.5%)
This column has a high cardinality (> 40).
year_first_hired
Int64DType- Null values
- 0 (0.0%)
- Unique values
-
51 (0.6%)
This column has a high cardinality (> 40).
- Mean ± Std
- 2.00e+03 ± 9.33
- Median ± IQR
- 2,005 ± 14
- Min | Max
- 1,965 | 2,016
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
Column
|
Column name
|
dtype
|
Is sorted
|
Null values
|
Unique values
|
Mean
|
Std
|
Min
|
Median
|
Max
|
---|---|---|---|---|---|---|---|---|---|---|
0 | gender | ObjectDType | False | 17 (0.2%) | 2 (< 0.1%) | |||||
1 | department | ObjectDType | False | 0 (0.0%) | 37 (0.4%) | |||||
2 | department_name | ObjectDType | False | 0 (0.0%) | 37 (0.4%) | |||||
3 | lsa_division_00 | Float32DType | False | 0 (0.0%) | 685 (7.4%) | 0.223 | 0.281 | 8.44e-05 | 0.134 | 1.13 |
4 | lsa_division_01 | Float32DType | False | 0 (0.0%) | 686 (7.4%) | 0.179 | 0.280 | -0.551 | 0.177 | 0.784 |
5 | lsa_division_02 | Float32DType | False | 0 (0.0%) | 685 (7.4%) | 0.0284 | 0.314 | -0.615 | -0.00738 | 0.977 |
6 | lsa_division_03 | Float32DType | False | 0 (0.0%) | 685 (7.4%) | 0.0428 | 0.305 | -0.443 | -0.00381 | 1.10 |
7 | lsa_division_04 | Float32DType | False | 0 (0.0%) | 692 (7.5%) | -0.0379 | 0.233 | -0.774 | -0.00801 | 0.362 |
8 | lsa_division_05 | Float32DType | False | 0 (0.0%) | 694 (7.5%) | -0.0430 | 0.218 | -0.678 | -0.0435 | 0.691 |
9 | lsa_division_06 | Float32DType | False | 0 (0.0%) | 694 (7.5%) | 0.0115 | 0.212 | -0.338 | -0.00443 | 1.20 |
10 | lsa_division_07 | Float32DType | False | 0 (0.0%) | 694 (7.5%) | 0.00129 | 0.206 | -0.822 | -0.00478 | 0.942 |
11 | lsa_division_08 | Float32DType | False | 0 (0.0%) | 694 (7.5%) | -0.0148 | 0.199 | -0.826 | -0.0126 | 0.674 |
12 | lsa_division_09 | Float32DType | False | 0 (0.0%) | 694 (7.5%) | 0.0293 | 0.188 | -0.178 | 0.00180 | 1.31 |
13 | lsa_division_10 | Float32DType | False | 0 (0.0%) | 694 (7.5%) | 0.00125 | 0.185 | -0.551 | 0.00335 | 1.07 |
14 | lsa_division_11 | Float32DType | False | 0 (0.0%) | 694 (7.5%) | 0.0192 | 0.175 | -0.637 | 0.00629 | 0.612 |
15 | lsa_division_12 | Float32DType | False | 0 (0.0%) | 694 (7.5%) | 0.0127 | 0.169 | -0.626 | -0.00487 | 0.919 |
16 | lsa_division_13 | Float32DType | False | 0 (0.0%) | 694 (7.5%) | 0.0291 | 0.157 | -0.398 | -0.000239 | 1.01 |
17 | lsa_division_14 | Float32DType | False | 0 (0.0%) | 694 (7.5%) | -0.0125 | 0.154 | -0.503 | -0.00473 | 0.559 |
18 | lsa_division_15 | Float32DType | False | 0 (0.0%) | 694 (7.5%) | 0.0101 | 0.151 | -0.580 | 0.00735 | 0.753 |
19 | lsa_division_16 | Float32DType | False | 0 (0.0%) | 694 (7.5%) | 0.00132 | 0.148 | -0.525 | -0.00243 | 0.554 |
20 | lsa_division_17 | Float32DType | False | 0 (0.0%) | 694 (7.5%) | -0.0203 | 0.140 | -0.428 | 0.000983 | 0.491 |
21 | lsa_division_18 | Float32DType | False | 0 (0.0%) | 694 (7.5%) | 0.0154 | 0.140 | -0.610 | 0.00737 | 0.509 |
22 | lsa_division_19 | Float32DType | False | 0 (0.0%) | 694 (7.5%) | 0.00203 | 0.140 | -0.409 | 0.00208 | 0.555 |
23 | lsa_division_20 | Float32DType | False | 0 (0.0%) | 694 (7.5%) | 0.00868 | 0.132 | -0.659 | -0.00165 | 0.557 |
24 | lsa_division_21 | Float32DType | False | 0 (0.0%) | 694 (7.5%) | -0.00636 | 0.131 | -0.495 | 0.000414 | 0.823 |
25 | lsa_division_22 | Float32DType | False | 0 (0.0%) | 694 (7.5%) | -0.00681 | 0.127 | -0.446 | -0.00207 | 0.541 |
26 | lsa_division_23 | Float32DType | False | 0 (0.0%) | 694 (7.5%) | -0.00277 | 0.121 | -0.682 | 0.00155 | 0.507 |
27 | lsa_division_24 | Float32DType | False | 0 (0.0%) | 694 (7.5%) | -0.00201 | 0.119 | -0.589 | -0.000608 | 0.725 |
28 | lsa_division_25 | Float32DType | False | 0 (0.0%) | 694 (7.5%) | 0.00360 | 0.116 | -0.424 | 0.00597 | 0.675 |
29 | lsa_division_26 | Float32DType | False | 0 (0.0%) | 694 (7.5%) | 0.000647 | 0.115 | -0.340 | -0.00659 | 0.644 |
30 | lsa_division_27 | Float32DType | False | 0 (0.0%) | 694 (7.5%) | -0.00787 | 0.113 | -0.501 | -0.000620 | 0.456 |
31 | lsa_division_28 | Float32DType | False | 0 (0.0%) | 694 (7.5%) | 0.00389 | 0.109 | -0.393 | -0.00444 | 0.537 |
32 | lsa_division_29 | Float32DType | False | 0 (0.0%) | 694 (7.5%) | 0.00152 | 0.107 | -0.353 | -0.00199 | 0.563 |
33 | assignment_category | ObjectDType | False | 0 (0.0%) | 2 (< 0.1%) | |||||
34 | lsa_employee_position_title_00 | Float32DType | False | 0 (0.0%) | 443 (4.8%) | 0.230 | 0.315 | 0.000534 | 0.0888 | 1.11 |
35 | lsa_employee_position_title_01 | Float32DType | False | 0 (0.0%) | 443 (4.8%) | 0.0831 | 0.338 | -0.342 | 0.00643 | 1.09 |
36 | lsa_employee_position_title_02 | Float32DType | False | 0 (0.0%) | 443 (4.8%) | 0.110 | 0.301 | -0.0618 | 0.0156 | 1.16 |
37 | lsa_employee_position_title_03 | Float32DType | False | 0 (0.0%) | 443 (4.8%) | 0.108 | 0.275 | -0.164 | 0.0336 | 0.984 |
38 | lsa_employee_position_title_04 | Float32DType | False | 0 (0.0%) | 443 (4.8%) | 0.0766 | 0.242 | -0.617 | 0.0431 | 0.658 |
39 | lsa_employee_position_title_05 | Float32DType | False | 0 (0.0%) | 443 (4.8%) | 0.0532 | 0.208 | -0.315 | 0.000349 | 0.954 |
40 | lsa_employee_position_title_06 | Float32DType | False | 0 (0.0%) | 443 (4.8%) | 0.0332 | 0.192 | -0.323 | 0.00455 | 0.782 |
41 | lsa_employee_position_title_07 | Float32DType | False | 0 (0.0%) | 443 (4.8%) | 0.0123 | 0.190 | -0.547 | -0.00157 | 0.640 |
42 | lsa_employee_position_title_08 | Float32DType | False | 0 (0.0%) | 443 (4.8%) | 0.0359 | 0.178 | -0.312 | 0.00362 | 0.590 |
43 | lsa_employee_position_title_09 | Float32DType | False | 0 (0.0%) | 443 (4.8%) | 0.00945 | 0.181 | -0.620 | -0.00301 | 0.812 |
44 | lsa_employee_position_title_10 | Float32DType | False | 0 (0.0%) | 443 (4.8%) | 0.0198 | 0.175 | -0.503 | -0.00949 | 0.730 |
45 | lsa_employee_position_title_11 | Float32DType | False | 0 (0.0%) | 443 (4.8%) | -0.000953 | 0.173 | -0.458 | 0.0114 | 0.815 |
46 | lsa_employee_position_title_12 | Float32DType | False | 0 (0.0%) | 443 (4.8%) | -0.00365 | 0.168 | -0.332 | -0.00364 | 0.961 |
47 | lsa_employee_position_title_13 | Float32DType | False | 0 (0.0%) | 443 (4.8%) | 0.0136 | 0.165 | -0.331 | 0.00563 | 0.649 |
48 | lsa_employee_position_title_14 | Float32DType | False | 0 (0.0%) | 443 (4.8%) | 0.00150 | 0.161 | -0.375 | -0.0214 | 0.465 |
49 | lsa_employee_position_title_15 | Float32DType | False | 0 (0.0%) | 443 (4.8%) | 0.0191 | 0.155 | -0.203 | 0.000372 | 1.10 |
50 | lsa_employee_position_title_16 | Float32DType | False | 0 (0.0%) | 443 (4.8%) | 0.0113 | 0.147 | -0.311 | -3.13e-05 | 0.839 |
51 | lsa_employee_position_title_17 | Float32DType | False | 0 (0.0%) | 443 (4.8%) | 0.00432 | 0.140 | -0.260 | -0.00160 | 0.757 |
52 | lsa_employee_position_title_18 | Float32DType | False | 0 (0.0%) | 443 (4.8%) | 0.00522 | 0.137 | -0.377 | 0.00169 | 1.00 |
53 | lsa_employee_position_title_19 | Float32DType | False | 0 (0.0%) | 443 (4.8%) | 0.0104 | 0.133 | -0.343 | -0.00217 | 0.467 |
54 | lsa_employee_position_title_20 | Float32DType | False | 0 (0.0%) | 443 (4.8%) | -0.00302 | 0.130 | -0.436 | -0.00924 | 0.486 |
55 | lsa_employee_position_title_21 | Float32DType | False | 0 (0.0%) | 443 (4.8%) | 0.00397 | 0.128 | -0.537 | -0.0141 | 0.553 |
56 | lsa_employee_position_title_22 | Float32DType | False | 0 (0.0%) | 443 (4.8%) | 0.00135 | 0.126 | -0.336 | -0.000470 | 0.500 |
57 | lsa_employee_position_title_23 | Float32DType | False | 0 (0.0%) | 443 (4.8%) | 0.00771 | 0.122 | -0.328 | -0.00613 | 0.945 |
58 | lsa_employee_position_title_24 | Float32DType | False | 0 (0.0%) | 443 (4.8%) | -0.00128 | 0.122 | -0.589 | 0.00407 | 0.689 |
59 | lsa_employee_position_title_25 | Float32DType | False | 0 (0.0%) | 443 (4.8%) | 0.00925 | 0.119 | -0.189 | 0.00535 | 0.661 |
60 | lsa_employee_position_title_26 | Float32DType | False | 0 (0.0%) | 443 (4.8%) | -0.000972 | 0.114 | -0.382 | 0.00150 | 0.491 |
61 | lsa_employee_position_title_27 | Float32DType | False | 0 (0.0%) | 443 (4.8%) | -0.00432 | 0.108 | -0.474 | -0.00204 | 0.410 |
62 | lsa_employee_position_title_28 | Float32DType | False | 0 (0.0%) | 443 (4.8%) | 0.00149 | 0.104 | -0.292 | -0.00324 | 0.625 |
63 | lsa_employee_position_title_29 | Float32DType | False | 0 (0.0%) | 443 (4.8%) | 0.00529 | 0.0975 | -0.174 | -0.0108 | 0.859 |
64 | date_first_hired | ObjectDType | False | 0 (0.0%) | 2264 (24.5%) | |||||
65 | year_first_hired | Int64DType | False | 0 (0.0%) | 51 (0.6%) | 2.00e+03 | 9.33 | 1,965 | 2,005 | 2,016 |
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
max_plot_columns
limit set for the TableReport during report creation.
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
max_association_columns
parameter.
Please enable javascript
The skrub table reports need javascript to display correctly. If you are displaying a report in a Jupyter notebook and you see this message, you may need to re-execute the cell or to trust the notebook (button on the top right or "File > Trust notebook").
In addition to the ApplyToCols
class, the
ApplyToFrame
class is useful for transformers that work on multiple
columns at once, such as the PCA
which reduces the
number of components.
To select columns without hardcoding their names, we introduce selectors, which allow for flexible matching pattern and composable logic.
The regex selector below will match all columns prefixed with "lsa"
, and pass them
to ApplyToFrame
which will assemble these columns into a dataframe and
finally pass it to the PCA.
from sklearn.decomposition import PCA
from skrub import ApplyToFrame
from skrub import selectors as s
apply_pca = ApplyToFrame(PCA(n_components=8), cols=s.regex("lsa"))
Xt = apply_pca.fit_transform(Xt)
Xt
gender | department | department_name | assignment_category | date_first_hired | year_first_hired | pca0 | pca1 | pca2 | pca3 | pca4 | pca5 | pca6 | pca7 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | F | POL | Department of Police | Fulltime-Regular | 09/22/1986 | 1,986 | 0.114 | -0.0334 | 0.0340 | -0.133 | -0.246 | 0.121 | 0.0572 | 0.375 |
1 | M | POL | Department of Police | Fulltime-Regular | 09/12/1988 | 1,988 | 0.499 | 0.140 | -0.102 | -0.0474 | -0.168 | 0.235 | -0.0246 | 0.275 |
2 | F | HHS | Department of Health and Human Services | Fulltime-Regular | 11/19/1989 | 1,989 | -0.123 | -0.124 | 0.393 | 0.0345 | 0.109 | -0.368 | -0.0446 | 0.365 |
3 | M | COR | Correction and Rehabilitation | Fulltime-Regular | 05/05/2014 | 2,014 | -0.0738 | -0.0886 | 0.167 | -0.185 | -0.397 | -0.116 | -0.323 | -0.0890 |
4 | M | HCA | Department of Housing and Community Affairs | Fulltime-Regular | 03/05/2007 | 2,007 | -0.0845 | -0.0593 | 0.192 | -0.237 | -0.0755 | -0.0450 | 0.206 | -0.0654 |
9,223 | F | HHS | Department of Health and Human Services | Fulltime-Regular | 11/03/2015 | 2,015 | -0.105 | -0.124 | 0.402 | 0.618 | -0.0454 | 0.352 | 0.0125 | -0.407 |
9,224 | F | FRS | Fire and Rescue Services | Fulltime-Regular | 11/28/1988 | 1,988 | -0.125 | 0.122 | 0.0629 | -0.0401 | -0.136 | 0.0858 | 0.0703 | 0.130 |
9,225 | M | HHS | Department of Health and Human Services | Parttime-Regular | 04/30/2001 | 2,001 | -0.116 | -0.131 | 0.306 | 0.202 | 0.0382 | -0.120 | -0.0172 | 0.0247 |
9,226 | M | CCL | County Council | Fulltime-Regular | 09/05/2006 | 2,006 | -0.126 | 0.0355 | 0.224 | -0.410 | 0.663 | 0.382 | -0.561 | 0.0169 |
9,227 | M | DLC | Department of Liquor Control | Fulltime-Regular | 01/30/2012 | 2,012 | -0.130 | 0.0601 | 0.127 | -0.163 | -0.0903 | -0.00503 | 0.151 | 0.0476 |
gender
ObjectDType- Null values
- 17 (0.2%)
- Unique values
- 2 (< 0.1%)
Most frequent values
M
F
['M', 'F']
department
ObjectDType- Null values
- 0 (0.0%)
- Unique values
- 37 (0.4%)
Most frequent values
POL
HHS
FRS
DOT
COR
DLC
DGS
LIB
DPS
SHF
['POL', 'HHS', 'FRS', 'DOT', 'COR', 'DLC', 'DGS', 'LIB', 'DPS', 'SHF']
department_name
ObjectDType- Null values
- 0 (0.0%)
- Unique values
- 37 (0.4%)
Most frequent values
Department of Police
Department of Health and Human Services
Fire and Rescue Services
Department of Transportation
Correction and Rehabilitation
Department of Liquor Control
Department of General Services
Department of Public Libraries
Department of Permitting Services
Sheriff's Office
['Department of Police', 'Department of Health and Human Services', 'Fire and Rescue Services', 'Department of Transportation', 'Correction and Rehabilitation', 'Department of Liquor Control', 'Department of General Services', 'Department of Public Libraries', 'Department of Permitting Services', "Sheriff's Office"]
assignment_category
ObjectDType- Null values
- 0 (0.0%)
- Unique values
- 2 (< 0.1%)
Most frequent values
Fulltime-Regular
Parttime-Regular
['Fulltime-Regular', 'Parttime-Regular']
date_first_hired
ObjectDType- Null values
- 0 (0.0%)
- Unique values
-
2,264 (24.5%)
This column has a high cardinality (> 40).
Most frequent values
12/12/2016
01/14/2013
02/24/2014
03/10/2014
08/12/2013
10/06/2014
09/22/2014
03/19/2007
07/16/2012
07/29/2013
['12/12/2016', '01/14/2013', '02/24/2014', '03/10/2014', '08/12/2013', '10/06/2014', '09/22/2014', '03/19/2007', '07/16/2012', '07/29/2013']
year_first_hired
Int64DType- Null values
- 0 (0.0%)
- Unique values
-
51 (0.6%)
This column has a high cardinality (> 40).
- Mean ± Std
- 2.00e+03 ± 9.33
- Median ± IQR
- 2,005 ± 14
- Min | Max
- 1,965 | 2,016
pca0
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
2,779 (30.1%)
This column has a high cardinality (> 40).
- Mean ± Std
- 3.07e-09 ± 0.476
- Median ± IQR
- -0.107 ± 0.0978
- Min | Max
- -0.516 | 1.48
pca1
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
2,779 (30.1%)
This column has a high cardinality (> 40).
- Mean ± Std
- 3.31e-09 ± 0.445
- Median ± IQR
- -0.0613 ± 0.159
- Min | Max
- -0.985 | 1.28
pca2
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
2,779 (30.1%)
This column has a high cardinality (> 40).
- Mean ± Std
- -1.32e-08 ± 0.406
- Median ± IQR
- 0.136 ± 0.487
- Min | Max
- -1.08 | 0.729
pca3
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
2,779 (30.1%)
This column has a high cardinality (> 40).
- Mean ± Std
- -1.16e-08 ± 0.318
- Median ± IQR
- -0.0397 ± 0.329
- Min | Max
- -0.658 | 1.33
pca4
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
2,779 (30.1%)
This column has a high cardinality (> 40).
- Mean ± Std
- -1.38e-08 ± 0.277
- Median ± IQR
- 0.0132 ± 0.196
- Min | Max
- -1.02 | 0.877
pca5
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
2,779 (30.1%)
This column has a high cardinality (> 40).
- Mean ± Std
- -3.36e-09 ± 0.266
- Median ± IQR
- -0.0411 ± 0.169
- Min | Max
- -0.715 | 1.15
pca6
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
2,779 (30.1%)
This column has a high cardinality (> 40).
- Mean ± Std
- -8.78e-09 ± 0.265
- Median ± IQR
- 0.0124 ± 0.132
- Min | Max
- -1.12 | 0.703
pca7
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
2,779 (30.1%)
This column has a high cardinality (> 40).
- Mean ± Std
- -2.58e-09 ± 0.249
- Median ± IQR
- -0.0384 ± 0.193
- Min | Max
- -0.724 | 0.771
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
Column
|
Column name
|
dtype
|
Is sorted
|
Null values
|
Unique values
|
Mean
|
Std
|
Min
|
Median
|
Max
|
---|---|---|---|---|---|---|---|---|---|---|
0 | gender | ObjectDType | False | 17 (0.2%) | 2 (< 0.1%) | |||||
1 | department | ObjectDType | False | 0 (0.0%) | 37 (0.4%) | |||||
2 | department_name | ObjectDType | False | 0 (0.0%) | 37 (0.4%) | |||||
3 | assignment_category | ObjectDType | False | 0 (0.0%) | 2 (< 0.1%) | |||||
4 | date_first_hired | ObjectDType | False | 0 (0.0%) | 2264 (24.5%) | |||||
5 | year_first_hired | Int64DType | False | 0 (0.0%) | 51 (0.6%) | 2.00e+03 | 9.33 | 1,965 | 2,005 | 2,016 |
6 | pca0 | Float32DType | False | 0 (0.0%) | 2779 (30.1%) | 3.07e-09 | 0.476 | -0.516 | -0.107 | 1.48 |
7 | pca1 | Float32DType | False | 0 (0.0%) | 2779 (30.1%) | 3.31e-09 | 0.445 | -0.985 | -0.0613 | 1.28 |
8 | pca2 | Float32DType | False | 0 (0.0%) | 2779 (30.1%) | -1.32e-08 | 0.406 | -1.08 | 0.136 | 0.729 |
9 | pca3 | Float32DType | False | 0 (0.0%) | 2779 (30.1%) | -1.16e-08 | 0.318 | -0.658 | -0.0397 | 1.33 |
10 | pca4 | Float32DType | False | 0 (0.0%) | 2779 (30.1%) | -1.38e-08 | 0.277 | -1.02 | 0.0132 | 0.877 |
11 | pca5 | Float32DType | False | 0 (0.0%) | 2779 (30.1%) | -3.36e-09 | 0.266 | -0.715 | -0.0411 | 1.15 |
12 | pca6 | Float32DType | False | 0 (0.0%) | 2779 (30.1%) | -8.78e-09 | 0.265 | -1.12 | 0.0124 | 0.703 |
13 | pca7 | Float32DType | False | 0 (0.0%) | 2779 (30.1%) | -2.58e-09 | 0.249 | -0.724 | -0.0384 | 0.771 |
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
gender
ObjectDType- Null values
- 17 (0.2%)
- Unique values
- 2 (< 0.1%)
Most frequent values
M
F
['M', 'F']
department
ObjectDType- Null values
- 0 (0.0%)
- Unique values
- 37 (0.4%)
Most frequent values
POL
HHS
FRS
DOT
COR
DLC
DGS
LIB
DPS
SHF
['POL', 'HHS', 'FRS', 'DOT', 'COR', 'DLC', 'DGS', 'LIB', 'DPS', 'SHF']
department_name
ObjectDType- Null values
- 0 (0.0%)
- Unique values
- 37 (0.4%)
Most frequent values
Department of Police
Department of Health and Human Services
Fire and Rescue Services
Department of Transportation
Correction and Rehabilitation
Department of Liquor Control
Department of General Services
Department of Public Libraries
Department of Permitting Services
Sheriff's Office
['Department of Police', 'Department of Health and Human Services', 'Fire and Rescue Services', 'Department of Transportation', 'Correction and Rehabilitation', 'Department of Liquor Control', 'Department of General Services', 'Department of Public Libraries', 'Department of Permitting Services', "Sheriff's Office"]
assignment_category
ObjectDType- Null values
- 0 (0.0%)
- Unique values
- 2 (< 0.1%)
Most frequent values
Fulltime-Regular
Parttime-Regular
['Fulltime-Regular', 'Parttime-Regular']
date_first_hired
ObjectDType- Null values
- 0 (0.0%)
- Unique values
-
2,264 (24.5%)
This column has a high cardinality (> 40).
Most frequent values
12/12/2016
01/14/2013
02/24/2014
03/10/2014
08/12/2013
10/06/2014
09/22/2014
03/19/2007
07/16/2012
07/29/2013
['12/12/2016', '01/14/2013', '02/24/2014', '03/10/2014', '08/12/2013', '10/06/2014', '09/22/2014', '03/19/2007', '07/16/2012', '07/29/2013']
year_first_hired
Int64DType- Null values
- 0 (0.0%)
- Unique values
-
51 (0.6%)
This column has a high cardinality (> 40).
- Mean ± Std
- 2.00e+03 ± 9.33
- Median ± IQR
- 2,005 ± 14
- Min | Max
- 1,965 | 2,016
pca0
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
2,779 (30.1%)
This column has a high cardinality (> 40).
- Mean ± Std
- 3.07e-09 ± 0.476
- Median ± IQR
- -0.107 ± 0.0978
- Min | Max
- -0.516 | 1.48
pca1
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
2,779 (30.1%)
This column has a high cardinality (> 40).
- Mean ± Std
- 3.31e-09 ± 0.445
- Median ± IQR
- -0.0613 ± 0.159
- Min | Max
- -0.985 | 1.28
pca2
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
2,779 (30.1%)
This column has a high cardinality (> 40).
- Mean ± Std
- -1.32e-08 ± 0.406
- Median ± IQR
- 0.136 ± 0.487
- Min | Max
- -1.08 | 0.729
pca3
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
2,779 (30.1%)
This column has a high cardinality (> 40).
- Mean ± Std
- -1.16e-08 ± 0.318
- Median ± IQR
- -0.0397 ± 0.329
- Min | Max
- -0.658 | 1.33
pca4
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
2,779 (30.1%)
This column has a high cardinality (> 40).
- Mean ± Std
- -1.38e-08 ± 0.277
- Median ± IQR
- 0.0132 ± 0.196
- Min | Max
- -1.02 | 0.877
pca5
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
2,779 (30.1%)
This column has a high cardinality (> 40).
- Mean ± Std
- -3.36e-09 ± 0.266
- Median ± IQR
- -0.0411 ± 0.169
- Min | Max
- -0.715 | 1.15
pca6
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
2,779 (30.1%)
This column has a high cardinality (> 40).
- Mean ± Std
- -8.78e-09 ± 0.265
- Median ± IQR
- 0.0124 ± 0.132
- Min | Max
- -1.12 | 0.703
pca7
Float32DType- Null values
- 0 (0.0%)
- Unique values
-
2,779 (30.1%)
This column has a high cardinality (> 40).
- Mean ± Std
- -2.58e-09 ± 0.249
- Median ± IQR
- -0.0384 ± 0.193
- Min | Max
- -0.724 | 0.771
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
Column 1 | Column 2 | Cramér's V | Pearson's Correlation |
---|---|---|---|
department | department_name | 1.00 | |
pca1 | pca2 | 0.619 | -0.0341 |
pca0 | pca2 | 0.550 | 0.00271 |
pca2 | pca3 | 0.538 | -0.0181 |
assignment_category | pca5 | 0.503 | |
pca4 | pca6 | 0.501 | 0.0555 |
department_name | pca2 | 0.494 | |
department | pca2 | 0.494 | |
pca3 | pca7 | 0.490 | 0.0391 |
pca5 | pca7 | 0.483 | -0.0199 |
department | pca1 | 0.469 | |
department_name | pca1 | 0.469 | |
pca6 | pca7 | 0.469 | 0.00288 |
assignment_category | pca7 | 0.467 | |
assignment_category | pca3 | 0.465 | |
assignment_category | pca2 | 0.449 | |
pca0 | pca1 | 0.440 | -0.0187 |
department | assignment_category | 0.409 | |
department_name | assignment_category | 0.409 | |
pca3 | pca6 | 0.402 | -0.0158 |
Please enable javascript
The skrub table reports need javascript to display correctly. If you are displaying a report in a Jupyter notebook and you see this message, you may need to re-execute the cell or to trust the notebook (button on the top right or "File > Trust notebook").
These two selectors are scikit-learn transformers and can be chained together within
a Pipeline
.
from sklearn.pipeline import make_pipeline
model = make_pipeline(
apply_string_encoder,
apply_pca,
).fit_transform(X)
Note that selectors also come in handy in a pipeline to select or drop columns, using
SelectCols
and DropCols
!
from sklearn.preprocessing import StandardScaler
from skrub import SelectCols
# Select only numerical columns
pipeline = make_pipeline(
SelectCols(cols=s.numeric()),
StandardScaler(),
).set_output(transform="pandas")
pipeline.fit_transform(Xt)
year_first_hired | pca0 | pca1 | pca2 | pca3 | pca4 | pca5 | pca6 | pca7 | |
---|---|---|---|---|---|---|---|---|---|
0 | -1.89 | 0.239 | -0.0750 | 0.0837 | -0.417 | -0.890 | 0.456 | 0.216 | 1.50 |
1 | -1.67 | 1.05 | 0.316 | -0.251 | -0.149 | -0.605 | 0.883 | -0.0930 | 1.10 |
2 | -1.57 | -0.259 | -0.278 | 0.968 | 0.108 | 0.394 | -1.38 | -0.169 | 1.47 |
3 | 1.12 | -0.155 | -0.199 | 0.411 | -0.580 | -1.43 | -0.436 | -1.22 | -0.357 |
4 | 0.365 | -0.178 | -0.133 | 0.472 | -0.745 | -0.272 | -0.169 | 0.777 | -0.263 |
9,223 | 1.22 | -0.221 | -0.278 | 0.991 | 1.94 | -0.164 | 1.32 | 0.0471 | -1.63 |
9,224 | -1.67 | -0.263 | 0.274 | 0.155 | -0.126 | -0.490 | 0.322 | 0.266 | 0.523 |
9,225 | -0.279 | -0.244 | -0.294 | 0.753 | 0.635 | 0.138 | -0.451 | -0.0651 | 0.0992 |
9,226 | 0.258 | -0.265 | 0.0797 | 0.553 | -1.29 | 2.39 | 1.43 | -2.12 | 0.0677 |
9,227 | 0.901 | -0.273 | 0.135 | 0.313 | -0.513 | -0.326 | -0.0189 | 0.571 | 0.191 |
year_first_hired
Float64DType- Null values
- 0 (0.0%)
- Unique values
-
51 (0.6%)
This column has a high cardinality (> 40).
- Mean ± Std
- -1.15e-14 ± 1.00
- Median ± IQR
- 0.150 ± 1.50
- Min | Max
- -4.14 | 1.33
pca0
Float64DType- Null values
- 0 (0.0%)
- Unique values
-
2,779 (30.1%)
This column has a high cardinality (> 40).
- Mean ± Std
- -2.50e-18 ± 1.00
- Median ± IQR
- -0.224 ± 0.206
- Min | Max
- -1.09 | 3.12
pca1
Float64DType- Null values
- 0 (0.0%)
- Unique values
-
2,779 (30.1%)
This column has a high cardinality (> 40).
- Mean ± Std
- 6.16e-18 ± 1.00
- Median ± IQR
- -0.138 ± 0.358
- Min | Max
- -2.22 | 2.87
pca2
Float64DType- Null values
- 0 (0.0%)
- Unique values
-
2,779 (30.1%)
This column has a high cardinality (> 40).
- Mean ± Std
- -1.39e-17 ± 1.00
- Median ± IQR
- 0.335 ± 1.20
- Min | Max
- -2.65 | 1.79
pca3
Float64DType- Null values
- 0 (0.0%)
- Unique values
-
2,779 (30.1%)
This column has a high cardinality (> 40).
- Mean ± Std
- 9.24e-18 ± 1.00
- Median ± IQR
- -0.125 ± 1.03
- Min | Max
- -2.07 | 4.19
pca4
Float64DType- Null values
- 0 (0.0%)
- Unique values
-
2,779 (30.1%)
This column has a high cardinality (> 40).
- Mean ± Std
- 5.39e-18 ± 1.00
- Median ± IQR
- 0.0477 ± 0.708
- Min | Max
- -3.67 | 3.17
pca5
Float64DType- Null values
- 0 (0.0%)
- Unique values
-
2,779 (30.1%)
This column has a high cardinality (> 40).
- Mean ± Std
- -4.62e-18 ± 1.00
- Median ± IQR
- -0.154 ± 0.635
- Min | Max
- -2.69 | 4.33
pca6
Float64DType- Null values
- 0 (0.0%)
- Unique values
-
2,779 (30.1%)
This column has a high cardinality (> 40).
- Mean ± Std
- -2.31e-18 ± 1.00
- Median ± IQR
- 0.0468 ± 0.499
- Min | Max
- -4.23 | 2.66
pca7
Float64DType- Null values
- 0 (0.0%)
- Unique values
-
2,779 (30.1%)
This column has a high cardinality (> 40).
- Mean ± Std
- -2.31e-17 ± 1.00
- Median ± IQR
- -0.154 ± 0.774
- Min | Max
- -2.91 | 3.10
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
Column
|
Column name
|
dtype
|
Is sorted
|
Null values
|
Unique values
|
Mean
|
Std
|
Min
|
Median
|
Max
|
---|---|---|---|---|---|---|---|---|---|---|
0 | year_first_hired | Float64DType | False | 0 (0.0%) | 51 (0.6%) | -1.15e-14 | 1.00 | -4.14 | 0.150 | 1.33 |
1 | pca0 | Float64DType | False | 0 (0.0%) | 2779 (30.1%) | -2.50e-18 | 1.00 | -1.09 | -0.224 | 3.12 |
2 | pca1 | Float64DType | False | 0 (0.0%) | 2779 (30.1%) | 6.16e-18 | 1.00 | -2.22 | -0.138 | 2.87 |
3 | pca2 | Float64DType | False | 0 (0.0%) | 2779 (30.1%) | -1.39e-17 | 1.00 | -2.65 | 0.335 | 1.79 |
4 | pca3 | Float64DType | False | 0 (0.0%) | 2779 (30.1%) | 9.24e-18 | 1.00 | -2.07 | -0.125 | 4.19 |
5 | pca4 | Float64DType | False | 0 (0.0%) | 2779 (30.1%) | 5.39e-18 | 1.00 | -3.67 | 0.0477 | 3.17 |
6 | pca5 | Float64DType | False | 0 (0.0%) | 2779 (30.1%) | -4.62e-18 | 1.00 | -2.69 | -0.154 | 4.33 |
7 | pca6 | Float64DType | False | 0 (0.0%) | 2779 (30.1%) | -2.31e-18 | 1.00 | -4.23 | 0.0468 | 2.66 |
8 | pca7 | Float64DType | False | 0 (0.0%) | 2779 (30.1%) | -2.31e-17 | 1.00 | -2.91 | -0.154 | 3.10 |
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
year_first_hired
Float64DType- Null values
- 0 (0.0%)
- Unique values
-
51 (0.6%)
This column has a high cardinality (> 40).
- Mean ± Std
- -1.15e-14 ± 1.00
- Median ± IQR
- 0.150 ± 1.50
- Min | Max
- -4.14 | 1.33
pca0
Float64DType- Null values
- 0 (0.0%)
- Unique values
-
2,779 (30.1%)
This column has a high cardinality (> 40).
- Mean ± Std
- -2.50e-18 ± 1.00
- Median ± IQR
- -0.224 ± 0.206
- Min | Max
- -1.09 | 3.12
pca1
Float64DType- Null values
- 0 (0.0%)
- Unique values
-
2,779 (30.1%)
This column has a high cardinality (> 40).
- Mean ± Std
- 6.16e-18 ± 1.00
- Median ± IQR
- -0.138 ± 0.358
- Min | Max
- -2.22 | 2.87
pca2
Float64DType- Null values
- 0 (0.0%)
- Unique values
-
2,779 (30.1%)
This column has a high cardinality (> 40).
- Mean ± Std
- -1.39e-17 ± 1.00
- Median ± IQR
- 0.335 ± 1.20
- Min | Max
- -2.65 | 1.79
pca3
Float64DType- Null values
- 0 (0.0%)
- Unique values
-
2,779 (30.1%)
This column has a high cardinality (> 40).
- Mean ± Std
- 9.24e-18 ± 1.00
- Median ± IQR
- -0.125 ± 1.03
- Min | Max
- -2.07 | 4.19
pca4
Float64DType- Null values
- 0 (0.0%)
- Unique values
-
2,779 (30.1%)
This column has a high cardinality (> 40).
- Mean ± Std
- 5.39e-18 ± 1.00
- Median ± IQR
- 0.0477 ± 0.708
- Min | Max
- -3.67 | 3.17
pca5
Float64DType- Null values
- 0 (0.0%)
- Unique values
-
2,779 (30.1%)
This column has a high cardinality (> 40).
- Mean ± Std
- -4.62e-18 ± 1.00
- Median ± IQR
- -0.154 ± 0.635
- Min | Max
- -2.69 | 4.33
pca6
Float64DType- Null values
- 0 (0.0%)
- Unique values
-
2,779 (30.1%)
This column has a high cardinality (> 40).
- Mean ± Std
- -2.31e-18 ± 1.00
- Median ± IQR
- 0.0468 ± 0.499
- Min | Max
- -4.23 | 2.66
pca7
Float64DType- Null values
- 0 (0.0%)
- Unique values
-
2,779 (30.1%)
This column has a high cardinality (> 40).
- Mean ± Std
- -2.31e-17 ± 1.00
- Median ± IQR
- -0.154 ± 0.774
- Min | Max
- -2.91 | 3.10
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
Column 1 | Column 2 | Cramér's V | Pearson's Correlation |
---|---|---|---|
pca1 | pca2 | 0.618 | -0.0431 |
pca0 | pca2 | 0.538 | -0.0166 |
pca2 | pca3 | 0.531 | 0.00787 |
pca4 | pca6 | 0.498 | -0.0102 |
pca5 | pca7 | 0.476 | 0.000455 |
pca6 | pca7 | 0.448 | -0.0235 |
pca0 | pca1 | 0.436 | -0.0197 |
pca3 | pca7 | 0.426 | 0.00495 |
pca3 | pca4 | 0.383 | -0.0201 |
pca4 | pca5 | 0.379 | 0.0151 |
pca3 | pca6 | 0.377 | -0.0113 |
pca2 | pca7 | 0.372 | -0.0160 |
pca1 | pca3 | 0.351 | 0.00434 |
pca3 | pca5 | 0.347 | 0.00362 |
pca1 | pca7 | 0.335 | 0.00639 |
pca5 | pca6 | 0.327 | -0.0189 |
pca2 | pca4 | 0.322 | 0.0107 |
pca2 | pca5 | 0.320 | 0.000822 |
pca4 | pca7 | 0.319 | -0.0148 |
pca0 | pca3 | 0.285 | 0.00445 |
Please enable javascript
The skrub table reports need javascript to display correctly. If you are displaying a report in a Jupyter notebook and you see this message, you may need to re-execute the cell or to trust the notebook (button on the top right or "File > Trust notebook").
Let’s run through one more example to showcase the expressiveness of the selectors.
Suppose we want to apply an OrdinalEncoder
on
categorical columns with low cardinality (e.g., fewer than 40
unique values).
We define a column filter using skrub selectors with a lambda function. Note that
the same effect can be obtained directly by using
cardinality_below()
.
from sklearn.preprocessing import OrdinalEncoder
low_cardinality = s.filter(lambda col: col.nunique() < 40)
ApplyToCols(OrdinalEncoder(), cols=s.string() & low_cardinality).fit_transform(X)
gender | department | department_name | division | assignment_category | employee_position_title | date_first_hired | year_first_hired | |
---|---|---|---|---|---|---|---|---|
0 | 0.00 | 32.0 | 14.0 | MSB Information Mgmt and Tech Division Records Management Section | 0.00 | Office Services Coordinator | 09/22/1986 | 1,986 |
1 | 1.00 | 32.0 | 14.0 | ISB Major Crimes Division Fugitive Section | 0.00 | Master Police Officer | 09/12/1988 | 1,988 |
2 | 0.00 | 19.0 | 10.0 | Adult Protective and Case Management Services | 0.00 | Social Worker IV | 11/19/1989 | 1,989 |
3 | 1.00 | 6.00 | 4.00 | PRRS Facility and Security | 0.00 | Resident Supervisor II | 05/05/2014 | 2,014 |
4 | 1.00 | 18.0 | 11.0 | Affordable Housing Programs | 0.00 | Planning Specialist III | 03/05/2007 | 2,007 |
9,223 | 0.00 | 19.0 | 10.0 | School Based Health Centers | 0.00 | Community Health Nurse II | 11/03/2015 | 2,015 |
9,224 | 0.00 | 17.0 | 20.0 | Human Resources Division | 0.00 | Fire/Rescue Division Chief | 11/28/1988 | 1,988 |
9,225 | 1.00 | 19.0 | 10.0 | Child and Adolescent Mental Health Clinic Services | 1.00 | Medical Doctor IV - Psychiatrist | 04/30/2001 | 2,001 |
9,226 | 1.00 | 3.00 | 6.00 | Council Central Staff | 0.00 | Manager II | 09/05/2006 | 2,006 |
9,227 | 1.00 | 11.0 | 12.0 | Licensure, Regulation and Education | 0.00 | Alcohol/Tobacco Enforcement Specialist II | 01/30/2012 | 2,012 |
gender
Float64DType- Null values
- 17 (0.2%)
- Unique values
- 2 (< 0.1%)
- Mean ± Std
- 0.595 ± 0.491
- Median ± IQR
- 1.00 ± 1.00
- Min | Max
- 0.00 | 1.00
department
Float64DType- Null values
- 0 (0.0%)
- Unique values
- 37 (0.4%)
- Mean ± Std
- 18.8 ± 8.97
- Median ± IQR
- 17.0 ± 15.0
- Min | Max
- 0.00 | 36.0
department_name
Float64DType- Null values
- 0 (0.0%)
- Unique values
- 37 (0.4%)
- Mean ± Std
- 14.4 ± 6.22
- Median ± IQR
- 14.0 ± 8.00
- Min | Max
- 0.00 | 36.0
division
ObjectDType- Null values
- 0 (0.0%)
- Unique values
-
694 (7.5%)
This column has a high cardinality (> 40).
Most frequent values
School Health Services
Transit Silver Spring Ride On
Transit Gaithersburg Ride On
Highway Services
Child Welfare Services
FSB Traffic Division School Safety Section
Income Supports
PSB 3rd District Patrol
PSB 4th District Patrol
Transit Nicholson Ride On
['School Health Services', 'Transit Silver Spring Ride On', 'Transit Gaithersburg Ride On', 'Highway Services', 'Child Welfare Services', 'FSB Traffic Division School Safety Section', 'Income Supports', 'PSB 3rd District Patrol', 'PSB 4th District Patrol', 'Transit Nicholson Ride On']
assignment_category
Float64DType- Null values
- 0 (0.0%)
- Unique values
- 2 (< 0.1%)
- Mean ± Std
- 0.0904 ± 0.287
- Median ± IQR
- 0.00 ± 0.00
- Min | Max
- 0.00 | 1.00
employee_position_title
ObjectDType- Null values
- 0 (0.0%)
- Unique values
-
443 (4.8%)
This column has a high cardinality (> 40).
Most frequent values
Bus Operator
Police Officer III
Firefighter/Rescuer III
Manager III
Firefighter/Rescuer II
Master Firefighter/Rescuer
Office Services Coordinator
School Health Room Technician I
Police Officer II
Community Health Nurse II
['Bus Operator', 'Police Officer III', 'Firefighter/Rescuer III', 'Manager III', 'Firefighter/Rescuer II', 'Master Firefighter/Rescuer', 'Office Services Coordinator', 'School Health Room Technician I', 'Police Officer II', 'Community Health Nurse II']
date_first_hired
ObjectDType- Null values
- 0 (0.0%)
- Unique values
-
2,264 (24.5%)
This column has a high cardinality (> 40).
Most frequent values
12/12/2016
01/14/2013
02/24/2014
03/10/2014
08/12/2013
10/06/2014
09/22/2014
03/19/2007
07/16/2012
07/29/2013
['12/12/2016', '01/14/2013', '02/24/2014', '03/10/2014', '08/12/2013', '10/06/2014', '09/22/2014', '03/19/2007', '07/16/2012', '07/29/2013']
year_first_hired
Int64DType- Null values
- 0 (0.0%)
- Unique values
-
51 (0.6%)
This column has a high cardinality (> 40).
- Mean ± Std
- 2.00e+03 ± 9.33
- Median ± IQR
- 2,005 ± 14
- Min | Max
- 1,965 | 2,016
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
Column
|
Column name
|
dtype
|
Is sorted
|
Null values
|
Unique values
|
Mean
|
Std
|
Min
|
Median
|
Max
|
---|---|---|---|---|---|---|---|---|---|---|
0 | gender | Float64DType | False | 17 (0.2%) | 2 (< 0.1%) | 0.595 | 0.491 | 0.00 | 1.00 | 1.00 |
1 | department | Float64DType | False | 0 (0.0%) | 37 (0.4%) | 18.8 | 8.97 | 0.00 | 17.0 | 36.0 |
2 | department_name | Float64DType | False | 0 (0.0%) | 37 (0.4%) | 14.4 | 6.22 | 0.00 | 14.0 | 36.0 |
3 | division | ObjectDType | False | 0 (0.0%) | 694 (7.5%) | |||||
4 | assignment_category | Float64DType | False | 0 (0.0%) | 2 (< 0.1%) | 0.0904 | 0.287 | 0.00 | 0.00 | 1.00 |
5 | employee_position_title | ObjectDType | False | 0 (0.0%) | 443 (4.8%) | |||||
6 | date_first_hired | ObjectDType | False | 0 (0.0%) | 2264 (24.5%) | |||||
7 | year_first_hired | Int64DType | False | 0 (0.0%) | 51 (0.6%) | 2.00e+03 | 9.33 | 1,965 | 2,005 | 2,016 |
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
gender
Float64DType- Null values
- 17 (0.2%)
- Unique values
- 2 (< 0.1%)
- Mean ± Std
- 0.595 ± 0.491
- Median ± IQR
- 1.00 ± 1.00
- Min | Max
- 0.00 | 1.00
department
Float64DType- Null values
- 0 (0.0%)
- Unique values
- 37 (0.4%)
- Mean ± Std
- 18.8 ± 8.97
- Median ± IQR
- 17.0 ± 15.0
- Min | Max
- 0.00 | 36.0
department_name
Float64DType- Null values
- 0 (0.0%)
- Unique values
- 37 (0.4%)
- Mean ± Std
- 14.4 ± 6.22
- Median ± IQR
- 14.0 ± 8.00
- Min | Max
- 0.00 | 36.0
division
ObjectDType- Null values
- 0 (0.0%)
- Unique values
-
694 (7.5%)
This column has a high cardinality (> 40).
Most frequent values
School Health Services
Transit Silver Spring Ride On
Transit Gaithersburg Ride On
Highway Services
Child Welfare Services
FSB Traffic Division School Safety Section
Income Supports
PSB 3rd District Patrol
PSB 4th District Patrol
Transit Nicholson Ride On
['School Health Services', 'Transit Silver Spring Ride On', 'Transit Gaithersburg Ride On', 'Highway Services', 'Child Welfare Services', 'FSB Traffic Division School Safety Section', 'Income Supports', 'PSB 3rd District Patrol', 'PSB 4th District Patrol', 'Transit Nicholson Ride On']
assignment_category
Float64DType- Null values
- 0 (0.0%)
- Unique values
- 2 (< 0.1%)
- Mean ± Std
- 0.0904 ± 0.287
- Median ± IQR
- 0.00 ± 0.00
- Min | Max
- 0.00 | 1.00
employee_position_title
ObjectDType- Null values
- 0 (0.0%)
- Unique values
-
443 (4.8%)
This column has a high cardinality (> 40).
Most frequent values
Bus Operator
Police Officer III
Firefighter/Rescuer III
Manager III
Firefighter/Rescuer II
Master Firefighter/Rescuer
Office Services Coordinator
School Health Room Technician I
Police Officer II
Community Health Nurse II
['Bus Operator', 'Police Officer III', 'Firefighter/Rescuer III', 'Manager III', 'Firefighter/Rescuer II', 'Master Firefighter/Rescuer', 'Office Services Coordinator', 'School Health Room Technician I', 'Police Officer II', 'Community Health Nurse II']
date_first_hired
ObjectDType- Null values
- 0 (0.0%)
- Unique values
-
2,264 (24.5%)
This column has a high cardinality (> 40).
Most frequent values
12/12/2016
01/14/2013
02/24/2014
03/10/2014
08/12/2013
10/06/2014
09/22/2014
03/19/2007
07/16/2012
07/29/2013
['12/12/2016', '01/14/2013', '02/24/2014', '03/10/2014', '08/12/2013', '10/06/2014', '09/22/2014', '03/19/2007', '07/16/2012', '07/29/2013']
year_first_hired
Int64DType- Null values
- 0 (0.0%)
- Unique values
-
51 (0.6%)
This column has a high cardinality (> 40).
- Mean ± Std
- 2.00e+03 ± 9.33
- Median ± IQR
- 2,005 ± 14
- Min | Max
- 1,965 | 2,016
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
Column 1 | Column 2 | Cramér's V | Pearson's Correlation |
---|---|---|---|
department | department_name | 0.646 | 0.388 |
division | assignment_category | 0.579 | |
assignment_category | employee_position_title | 0.491 | |
division | employee_position_title | 0.424 | |
department_name | employee_position_title | 0.395 | |
department_name | division | 0.326 | |
department | employee_position_title | 0.325 | |
department | division | 0.310 | |
gender | department_name | 0.293 | 0.229 |
gender | employee_position_title | 0.284 | |
department | assignment_category | 0.262 | 0.0647 |
gender | division | 0.255 | |
department_name | assignment_category | 0.253 | -0.0973 |
gender | assignment_category | 0.246 | -0.245 |
employee_position_title | date_first_hired | 0.229 | |
gender | department | 0.223 | -0.0966 |
date_first_hired | year_first_hired | 0.156 | |
department_name | date_first_hired | 0.146 | |
employee_position_title | year_first_hired | 0.123 | |
department | date_first_hired | 0.109 |
Please enable javascript
The skrub table reports need javascript to display correctly. If you are displaying a report in a Jupyter notebook and you see this message, you may need to re-execute the cell or to trust the notebook (button on the top right or "File > Trust notebook").
Notice how we composed the selector with string()
using a logical operator. This resulting selector matches string
columns with cardinality below 40
.
We can also define the opposite selector high_cardinality
using the negation
operator ~
and apply a skrub.StringEncoder
to vectorize those
columns.
from sklearn.ensemble import HistGradientBoostingRegressor
high_cardinality = ~low_cardinality
pipeline = make_pipeline(
ApplyToCols(
OrdinalEncoder(),
cols=s.string() & low_cardinality,
),
ApplyToCols(
StringEncoder(),
cols=s.string() & high_cardinality,
),
HistGradientBoostingRegressor(),
).fit(X, y)
pipeline
Pipeline(steps=[('applytocols-1', ApplyToCols(cols=(string() & filter(<lambda>)), transformer=OrdinalEncoder())), ('applytocols-2', ApplyToCols(cols=(string() & (~filter(<lambda>))), transformer=StringEncoder())), ('histgradientboostingregressor', HistGradientBoostingRegressor())])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
steps | [('applytocols-1', ...), ('applytocols-2', ...), ...] | |
transform_input | None | |
memory | None | |
verbose | False |
Parameters
transformer | OrdinalEncoder() | |
cols | (string() & filter(<lambda>)) | |
allow_reject | False | |
keep_original | False | |
rename_columns | '{}' | |
n_jobs | None |
OrdinalEncoder()
Parameters
categories | 'auto' | |
dtype | <class 'numpy.float64'> | |
handle_unknown | 'error' | |
unknown_value | None | |
encoded_missing_value | nan | |
min_frequency | None | |
max_categories | None |
Parameters
transformer | StringEncoder() | |
cols | (string() & (...er(<lambda>))) | |
allow_reject | False | |
keep_original | False | |
rename_columns | '{}' | |
n_jobs | None |
StringEncoder()
Parameters
n_components | 30 | |
vectorizer | 'tfidf' | |
ngram_range | (3, ...) | |
analyzer | 'char_wb' | |
stop_words | None | |
random_state | None |
Parameters
loss | 'squared_error' | |
quantile | None | |
learning_rate | 0.1 | |
max_iter | 100 | |
max_leaf_nodes | 31 | |
max_depth | None | |
min_samples_leaf | 20 | |
l2_regularization | 0.0 | |
max_features | 1.0 | |
max_bins | 255 | |
categorical_features | 'from_dtype' | |
monotonic_cst | None | |
interaction_cst | None | |
warm_start | False | |
early_stopping | 'auto' | |
scoring | 'loss' | |
validation_fraction | 0.1 | |
n_iter_no_change | 10 | |
tol | 1e-07 | |
verbose | 0 | |
random_state | None |
Interestingly, the pipeline above is similar to the datatype dispatching performed by
TableVectorizer
, also used in tabular_learner()
.
Click on the dropdown arrows next to the datatype to see the columns are mapped to
the different transformers in TableVectorizer
.
from skrub import tabular_learner
tabular_learner("regressor").fit(X, y)
/home/circleci/project/skrub/_tabular_pipeline.py:75: FutureWarning:
tabular_learner will be deprecated in the next release. Equivalent functionality is available in skrub.set_config.
Pipeline(steps=[('tablevectorizer', TableVectorizer(low_cardinality=ToCategorical())), ('histgradientboostingregressor', HistGradientBoostingRegressor())])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
steps | [('tablevectorizer', ...), ('histgradientboostingregressor', ...)] | |
transform_input | None | |
memory | None | |
verbose | False |
Parameters
cardinality_threshold | 40 | |
low_cardinality | ToCategorical() | |
high_cardinality | StringEncoder() | |
numeric | PassThrough() | |
datetime | DatetimeEncoder() | |
specific_transformers | () | |
drop_null_fraction | 1.0 | |
drop_if_constant | False | |
drop_if_unique | False | |
datetime_format | None | |
n_jobs | None |
['year_first_hired']
Parameters
['date_first_hired']
Parameters
resolution | 'hour' | |
add_weekday | False | |
add_total_seconds | True | |
add_day_of_year | False | |
periodic_encoding | None |
['gender', 'department', 'department_name', 'assignment_category']
Parameters
['division', 'employee_position_title']
Parameters
n_components | 30 | |
vectorizer | 'tfidf' | |
ngram_range | (3, ...) | |
analyzer | 'char_wb' | |
stop_words | None | |
random_state | None |
Parameters
loss | 'squared_error' | |
quantile | None | |
learning_rate | 0.1 | |
max_iter | 100 | |
max_leaf_nodes | 31 | |
max_depth | None | |
min_samples_leaf | 20 | |
l2_regularization | 0.0 | |
max_features | 1.0 | |
max_bins | 255 | |
categorical_features | 'from_dtype' | |
monotonic_cst | None | |
interaction_cst | None | |
warm_start | False | |
early_stopping | 'auto' | |
scoring | 'loss' | |
validation_fraction | 0.1 | |
n_iter_no_change | 10 | |
tol | 1e-07 | |
verbose | 0 | |
random_state | None |
Total running time of the script: (0 minutes 8.264 seconds)