filter_names#
- skrub.selectors.filter_names(predicate, *args, **kwargs)[source]#
Select columns based on their name.
For a column whose name is
col_name
,predicate
is called aspredicate(col_name, *args, **kwargs)
and the column is selected if returnsTrue
. Note this is different fromfilter
, because here the predicate is passed the column name whereas withfilter
, the predicate is passed the actual column (pandas or polars Series).args
andkwargs
are extra parameters for the predicate. Storing parameters like this rather than in a closure can help using an importable function as the predicate rather than a local one, which is necessary to pickle the selector. (An alternative is to usefunctools.partial
).Examples
>>> from skrub import selectors as s >>> import pandas as pd >>> df = pd.DataFrame( ... { ... "height_mm": [297.0, 420.0], ... "width_mm": [210.0, 297.0], ... "kind": ["A4", "A3"], ... "ID": [4, 3], ... } ... ) >>> df height_mm width_mm kind ID 0 297.0 210.0 A4 4 1 420.0 297.0 A3 3
>>> selector = s.filter_names(lambda name: name.endswith('_mm')) >>> s.select(df, selector) height_mm width_mm 0 297.0 210.0 1 420.0 297.0
If we want to pickle the selector, we’re better off using an importable function and passing the arguments separately:
>>> selector = s.filter_names(str.endswith, '_mm') >>> selector filter_names(str.endswith, '_mm')
>>> s.select(df, selector) height_mm width_mm 0 297.0 210.0 1 420.0 297.0
>>> import pickle >>> _ = pickle.dumps(selector) # OK