Customizing the global configuration#
Skrub includes a configuration manager that allows setting various parameters
(see the set_config() documentation for more detail).
It is possible to change configuration options using the set_config() function:
>>> from skrub import set_config
>>> set_config(table_report_verbosity=0)
This alters the behavior of skrub in the current script. Each configuration parameter has an environment variable that can be used to set it permanently.
Additionally, a config_context() is provided to allow temporarily altering the
configuration:
>>> import skrub
>>> with skrub.config_context(max_plot_columns=1):
... pass
Within this context, only the code executed inside the with statement is affected.
The get_config() function allows retrieving the current configuration.
Configuration parameters#
The configuration parameters that can be set with set_config and config_context
are available by using
>>> import skrub
>>> config = skrub.get_config()
>>> config.keys()
dict_keys(['use_table_report_data_ops', 'table_report_verbosity', 'max_plot_columns', 'max_association_columns', 'subsampling_seed', 'enable_subsampling', 'float_precision', 'cardinality_threshold', 'data_dir', 'eager_data_ops'])
These are the parameters currently available in the global configuration:
Parameter Name |
Default Value |
Env Variable |
Description |
|---|---|---|---|
|
|
|
Set the HTML representation used for the Data Ops previews. If |
|
|
|
Set the verbosity of the |
|
30 |
|
If a dataframe has more columns than the value set here, the |
|
30 |
|
If a dataframe has more columns than the value set here, the |
|
0 |
|
Set the random seed of subsampling in |
|
|
|
Control the activation of subsampling in |
|
3 |
|
Control the number of significant digits shown when formatting floats. Applies overall precision rather than fixed decimal places. |
|
40 |
|
Set the |
|
|
|
Set the default location used by skrub to store datasets and other data, such as the Data Ops reports. |
|
|
|
Eagerly perform checks on the DataOps as soon they are created, and compute previews if preview data is available. If disabled, those checks are delayed until the DataOp is actually used |