Customizing the global configuration#

Skrub includes a configuration manager that allows setting various parameters (see the set_config() documentation for more detail).

It is possible to change configuration options using the set_config() function:

>>> from skrub import set_config
>>> set_config(use_table_report=True)

This alters the behavior of skrub in the current script. Each configuration parameter has an environment variable that can be used to set it permanently.

Additionally, a config_context() is provided to allow temporarily altering the configuration:

>>> import skrub
>>> with skrub.config_context(max_plot_columns=1):
...     pass

Within this context, only the code executed inside the with statement is affected.

The get_config() function allows retrieving the current configuration.

Configuration parameters#

The configuration parameters that can be set with set_config and config_context are available by using

>>> import skrub
>>> config = skrub.get_config()
>>> config.keys()
dict_keys(['use_table_report', 'use_table_report_data_ops', 'table_report_verbosity', 'max_plot_columns', 'max_association_columns', 'subsampling_seed', 'enable_subsampling', 'float_precision', 'cardinality_threshold'])

These are the parameters currently available in the global configuration:

Skrub Configuration Parameters#
Parameter Name	Default Value	Env Variable	Description
`use_table_report`	`False`	`SKB_USE_TABLE_REPORT`	If set to `True`, the HTML representation of Pandas and Polars dataframes is replaced with the `TableReport`.
`use_table_report_data_ops`	`True`	`SKB_USE_TABLE_REPORT_DATA_OPS`	Set the HTML representation used for the Data Ops previews. If `True`, use the `TableReport`, otherwise use the default Pandas or Polars representation.
`max_plot_columns`	30	`SKB_MAX_PLOT_COLUMNS`	If a dataframe has more columns than the value set here, the `TableReport` will skip generating the plots.
`max_association_columns`	30	`SKB_MAX_ASSOCIATION_COLUMNS`	If a dataframe has more columns than the value set here, the `TableReport` will skip computing the associations.
`subsampling_seed`	0	`SKB_SUBSAMPLING_SEED`	Set the random seed of subsampling in `skrub.DataOp.skb.subsample()`, when `how="random"` is passed.
`enable_subsampling`	`"default"`	`SKB_ENABLE_SUBSAMPLING`	Control the activation of subsampling in `skrub.DataOp.skb.subsample()`. If `"default"`, the behavior of `skrub.DataOp.skb.subsample()` is used. If `"disable"`, subsampling is never used, so skb.subsample becomes a no-op. If `"force"`, subsampling is used in all DataOps evaluation modes (eval(), fit_transform, etc.).
`float_precision`	3	`SKB_FLOAT_PRECISION`	Control the number of significant digits shown when formatting floats. Applies overall precision rather than fixed decimal places.
`cardinality_threshold`	40	`SKB_CARDINALITY_THRESHOLD`	Set the `cardinality_threshold` argument of `TableVectorizer`. Additionally, set the threshold for warning the user about high cardinality features in the `TableReport`.

Customizing the global configuration#

Configuration parameters#

This Page