filter_names#
- skrub.selectors.filter_names(predicate, *args, **kwargs)[source]#
Select columns based on their name.
For a column whose name is
col_name,predicateis called aspredicate(col_name, *args, **kwargs)and the column is selected if returnsTrue. Note this is different fromfilter, because here the predicate is passed the column name whereas withfilter, the predicate is passed the actual column (pandas or polars Series).argsandkwargsare extra parameters for the predicate. Storing parameters like this rather than in a closure can help using an importable function as the predicate rather than a local one, which is necessary to pickle the selector. (An alternative is to usefunctools.partial).Examples
>>> from skrub import selectors as s >>> import pandas as pd >>> df = pd.DataFrame( ... { ... "height_mm": [297.0, 420.0], ... "width_mm": [210.0, 297.0], ... "kind": ["A4", "A3"], ... "ID": [4, 3], ... } ... ) >>> df height_mm width_mm kind ID 0 297.0 210.0 A4 4 1 420.0 297.0 A3 3
>>> selector = s.filter_names(lambda name: name.endswith('_mm')) >>> s.select(df, selector) height_mm width_mm 0 297.0 210.0 1 420.0 297.0
If we want to pickle the selector, we’re better off using an importable function and passing the arguments separately:
>>> selector = s.filter_names(str.endswith, '_mm') >>> selector filter_names(str.endswith, '_mm')
>>> s.select(df, selector) height_mm width_mm 0 297.0 210.0 1 420.0 297.0
>>> import pickle >>> _ = pickle.dumps(selector) # OK