regex#

skrub.selectors.regex(pattern, flags=0)[source]#

Select columns by name with a regular expression.

pattern can be a string pattern or a compiled regular expression, and flags are regular expression flags as described in the re module documentation:

https://docs.python.org/3/library/re.html#flags

Examples

>>> from skrub import selectors as s
>>> import pandas as pd
>>> df = pd.DataFrame(
...     {
...         "height_mm": [297.0, 420.0],
...         "width_mm": [210.0, 297.0],
...         "kind": ["A4", "A3"],
...         "ID": [4, 3],
...     }
... )
>>> df
   height_mm  width_mm kind  ID
0      297.0     210.0   A4   4
1      420.0     297.0   A3   3
>>> s.select(df, s.regex('.*_mm'))
   height_mm  width_mm
0      297.0     210.0
1      420.0     297.0

A column is selected if re.match(col_name, pattern, flags) returns a match. Note that it is enough to match at the beginning of the string:

>>> s.select(df, s.regex('wid'))
   width_mm
0     210.0
1     297.0

Use ‘$’ to require matching until the end of the column name:

>>> s.select(df, s.regex('wid$'))
Empty DataFrame
Columns: []
Index: [0, 1]

Flags are passed to re.match; the following are 3 equivalent ways of setting re flags (re.IGNORECASE in this example):

>>> import re
>>> s.select(df, s.regex('id', flags=re.I))
   ID
0   4
1   3
>>> s.select(df, s.regex('(?i)id'))
   ID
0   4
1   3
>>> s.select(df, s.regex(re.compile('id', re.I)))
   ID
0   4
1   3