set_config#
- skrub.set_config(use_table_report_data_ops=None, table_report_verbosity=None, max_plot_columns=None, max_association_columns=None, subsampling_seed=None, enable_subsampling=None, float_precision=None, cardinality_threshold=None, data_dir=None, eager_data_ops=None)[source]#
Set global skrub configuration.
- Parameters:
- use_table_report_data_ops
bool, default=None The type of HTML representation used for the dataframes preview in skrub DataOps. If
None, falls back to the current configuration, which isTrueby default.If
True,TableReportwill be used.If
False, the original Pandas or Polars dataframe display will be used.
This configuration can also be set with the
SKB_USE_TABLE_REPORT_DATA_OPSenvironment variable.- table_report_verbosity
int, default=None Set the level of verbosity of the
TableReport. Default is 1 (print the progress bar). Refer to theTableReportdocumentation for more details.- max_plot_columns
int, default=None Set the
max_plot_columnsargument ofTableReport. Default is 30. If “all”, all columns will be plotted.This configuration can also be set with the
SKB_MAX_PLOT_COLUMNSenvironment variable.- max_association_columns
int, default=None Set the
max_association_columnsargument ofTableReport. Default is 30. If “all”, all columns will be plotted.This configuration can also be set with the
SKB_MAX_ASSOCIATION_COLUMNSenvironment variable.- subsampling_seed
int, default=None Set the random seed of subsampling in skrub DataOps
skrub.DataOp.skb.subsample(), whenhow="random"is passed.This configuration can also be set with the
SKB_SUBSAMPLING_SEEDenvironment variable.- enable_subsampling{‘default’, ‘disable’, ‘force’}, default=None
Control the activation of subsampling in skrub DataOps
skrub.DataOp.skb.subsample(). Default is"default".If
"default", the behavior ofskrub.DataOp.skb.subsample()is used.If
"disable", subsampling is never used, soskb.subsamplebecomes a no-op.If
"force", subsampling is used in all DataOps evaluation modes (eval(), fit_transform, etc.).
This configuration can also be set with the
SKB_ENABLE_SUBSAMPLINGenvironment variable.- float_precision
int, default=3 Control the number of significant digits shown when formatting floats. Applies overall precision rather than fixed decimal places. Default is 3.
This configuration can also be set with the
SKB_FLOAT_PRECISIONenvironment variable.- cardinality_threshold
int, default=40 Set the
cardinality_thresholdargument ofTableVectorizer. Control the threshold value used to warn user if they have high cardinality columns in there dataset.This configuration can also be set with the
SKB_CARDINALITY_THRESHOLDenvironment variable.- data_dir
stror pathlib.Path, default=None Set the data directory path for skrub datasets. If
None, falls back to the current configuration.If the
SKB_DATA_DIRECTORYenvironment variable is set to an absolute path, that path will be used.Otherwise, the default is
~/skrub_data.
This configuration can also be set with the
SKB_DATA_DIRECTORYenvironment variable. The deprecatedSKRUB_DATA_DIRECTORYis still supported with a deprecation warning.- eager_data_ops
bool, default=True Eagerly perform checks on the DataOps as soon they are created, and compute previews if preview data is available. If disabled, those checks are delayed until the DataOp is actually used (e.g. by calling
.skb.eval()ormake_learner()), and previews are not computed.This option is used to speed-up the creation of large DataOps containing many nodes. It can also be useful in rare cases where a DataOp needs no inputs (for example it relies on a hard-coded filename to load data) but we want to prevent it from computing preview results as soon as it is constructed and delay computation until we explicitly request it. For most DataOps that do need inputs (contain
skrub.var()nodes), previews can also be disabled simply by not providing preview data toskrub.var().This configuration can also be set with the
SKB_EAGER_DATA_OPSenvironment variable.
- use_table_report_data_ops
See also
get_configRetrieve current values for global configuration.
config_contextContext manager for global skrub configuration.
Examples
>>> from skrub import set_config >>> set_config(use_table_report_data_ops=True)