set_config#

skrub.set_config(use_table_report=None, use_table_report_data_ops=None, table_report_verbosity=None, max_plot_columns=None, max_association_columns=None, subsampling_seed=None, enable_subsampling=None, float_precision=None, cardinality_threshold=None)[source]#

Set global skrub configuration.

Parameters:

use_table_reportbool, default=None

The type of display used for dataframes. If None, falls back to the current configuration, which is False by default.

If True, replace the default DataFrame HTML displays with TableReport.
If False, the original Pandas or Polars dataframe HTML representation will be used.

This configuration can also be set with the SKB_USE_TABLE_REPORT environment variable.

use_table_report_data_opsbool, default=None

The type of HTML representation used for the dataframes preview in skrub DataOps. If None, falls back to the current configuration, which is True by default.

If True, TableReport will be used.
If False, the original Pandas or Polars dataframe display will be used.

This configuration can also be set with the SKB_USE_TABLE_REPORT_DATA_OPS environment variable.

table_report_verbosityint, default=None

Set the level of verbosity of the TableReport. Default is 1 (print the progress bar). Refer to the TableReport documentation for more details.

max_plot_columnsint, default=None

Set the max_plot_columns argument of TableReport. Default is 30. If “all”, all columns will be plotted.

This configuration can also be set with the SKB_MAX_PLOT_COLUMNS environment variable.

max_association_columnsint, default=None

Set the max_association_columns argument of TableReport. Default is 30. If “all”, all columns will be plotted.

This configuration can also be set with the SKB_MAX_ASSOCIATION_COLUMNS environment variable.

subsampling_seedint, default=None

Set the random seed of subsampling in skrub DataOps skrub.DataOp.skb.subsample(), when how="random" is passed.

This configuration can also be set with the SKB_SUBSAMPLING_SEED environment variable.

enable_subsampling{‘default’, ‘disable’, ‘force’}, default=None

Control the activation of subsampling in skrub DataOps skrub.DataOp.skb.subsample(). Default is "default".

If "default", the behavior of skrub.DataOp.skb.subsample() is used.
If "disable", subsampling is never used, so skb.subsample becomes a no-op.
If "force", subsampling is used in all DataOps evaluation modes (eval(), fit_transform, etc.).

This configuration can also be set with the SKB_ENABLE_SUBSAMPLING environment variable.

float_precisionint, default=3

Control the number of significant digits shown when formatting floats. Applies overall precision rather than fixed decimal places. Default is 3.

This configuration can also be set with the SKB_FLOAT_PRECISION environment variable.

cardinality_thresholdint, default=40

Set the cardinality_threshold argument of TableVectorizer. Control the threshold value used to warn user if they have high cardinality columns in there dataset.

This configuration can also be set with the SKB_CARDINALITY_THRESHOLD environment variable.

Gallery examples#

Getting Started

Hands-On with Column Selection and Transformers

set_config#

Gallery examples#

This Page