set_config#
- skrub.set_config(use_table_report_data_ops=None, table_report_plots_threshold=None, table_report_associations_threshold=None, table_report_verbosity=None, subsampling_seed=None, max_plot_columns=None, max_association_columns=None, enable_subsampling=None, float_precision=None, cardinality_threshold=None, data_dir=None, eager_data_ops=None, data_ops_open_graph_dropdown=None)[source]#
Set global skrub configuration.
- Parameters:
- use_table_report_data_ops
bool, default=None The type of HTML representation used for the dataframes preview in skrub DataOps. If
None, falls back to the current configuration, which isTrueby default.If
True,TableReportwill be used.If
False, the original Pandas or Polars dataframe display will be used.
This configuration can also be set with the
SKB_USE_TABLE_REPORT_DATA_OPSenvironment variable.- table_report_plots_threshold
int, default=None Maximum number of columns for which distribution plots are generated in
TableReportwhenplot_distributions="auto"(the default). Dataframes with more columns will skip plots. Default is 30.This configuration can also be set with the
SKB_TABLE_REPORT_PLOTS_THRESHOLDenvironment variable.- table_report_associations_threshold
int, default=None Maximum number of columns for which associations are computed in
TableReportwhencompute_associations="auto"(the default). Dataframes with more columns will skip associations. Default is 30.This configuration can also be set with the
SKB_TABLE_REPORT_ASSOCIATIONS_THRESHOLDenvironment variable.- table_report_verbosity
int, default=None Set the level of verbosity of the
TableReport. Default is 1 (print the progress bar). Refer to theTableReportdocumentation for more details.- subsampling_seed
int, default=None Set the random seed of subsampling in skrub DataOps
skrub.DataOp.skb.subsample(), whenhow="random"is passed.This configuration can also be set with the
SKB_SUBSAMPLING_SEEDenvironment variable.- enable_subsampling{‘default’, ‘disable’, ‘force’}, default=None
Control the activation of subsampling in skrub DataOps
skrub.DataOp.skb.subsample(). Default is"default".If
"default", the behavior ofskrub.DataOp.skb.subsample()is used.If
"disable", subsampling is never used, soskb.subsamplebecomes a no-op.If
"force", subsampling is used in all DataOps evaluation modes (eval(), fit_transform, etc.).
This configuration can also be set with the
SKB_ENABLE_SUBSAMPLINGenvironment variable.- float_precision
int, default=3 Control the number of significant digits shown when formatting floats. Applies overall precision rather than fixed decimal places. Default is 3.
This configuration can also be set with the
SKB_FLOAT_PRECISIONenvironment variable.- cardinality_threshold
int, default=40 Set the
cardinality_thresholdargument ofTableVectorizer. Control the threshold value used to warn user if they have high cardinality columns in there dataset.This configuration can also be set with the
SKB_CARDINALITY_THRESHOLDenvironment variable.- data_dir
stror pathlib.Path, default=None Set the data directory path for skrub datasets. If
None, falls back to the current configuration.If the
SKB_DATA_DIRECTORYenvironment variable is set to an absolute path, that path will be used.Otherwise, the default is
~/skrub_data.
This configuration can also be set with the
SKB_DATA_DIRECTORYenvironment variable. The deprecatedSKRUB_DATA_DIRECTORYis still supported with a deprecation warning.- eager_data_ops
bool, default=True Eagerly perform checks on the DataOps as soon they are created, and compute previews if preview data is available. If disabled, those checks are delayed until the DataOp is actually used (e.g. by calling
.skb.eval()ormake_learner()), and previews are not computed.This option is used to speed-up the creation of large DataOps containing many nodes. It can also be useful in rare cases where a DataOp needs no inputs (for example it relies on a hard-coded filename to load data) but we want to prevent it from computing preview results as soon as it is constructed and delay computation until we explicitly request it. For most DataOps that do need inputs (contain
skrub.var()nodes), previews can also be disabled simply by not providing preview data toskrub.var().This configuration can also be set with the
SKB_EAGER_DATA_OPSenvironment variable.- data_ops_open_graph_dropdown
bool, default=False When displaying a DataOp that has a preview value in a jupyter notebook, should the dropdown that reveals the computational graph drawing be open (if True) or close (if False). This option mostly exists to control the display of DataOps in the skrub documentation examples. This configuration can also be set with the
SKB_DATA_OPS_OPEN_GRAPH_DROPDOWNenvironment variable.
- use_table_report_data_ops
See also
get_configRetrieve current values for global configuration.
config_contextContext manager for global skrub configuration.
Examples
>>> from skrub import set_config >>> set_config(use_table_report_data_ops=True)
Gallery examples#
Multiples tables: building machine learning pipelines with DataOps