TableReport#

class skrub.TableReport(dataframe, n_rows=10, order_by=None, title=None, column_filters=None, verbose=None, plot_distributions='auto', compute_associations='auto', open_tab='table', max_plot_columns=None, max_association_columns=None)[source]#

Summarize the contents of a dataframe.

This class summarizes a dataframe or numpy array, providing information such as the type and summary statistics (mean, number of missing values, etc.) for each column. Numpy arrays are converted to pandas DataFrame or Series.

Parameters:
dataframepandas or polars Series or DataFrame

The dataframe or series to summarize.

n_rowsint, default=10

Maximum number of rows to show in the sample table. Half will be taken from the beginning (head) of the dataframe and half from the end (tail). Note this is only for display. Summary statistics, histograms etc. are computed using the whole dataframe.

order_bystr

Column name to use for sorting. Other numerical columns will be plotted as function of the sorting column. Must be of numerical or datetime type.

titlestr

Title for the report.

column_filtersdict

A dict for adding custom entries to the column filter dropdown menu. Each key is the filter named to be displayed in the dropdown menu (e.g. "first_10"), and the value is the desired filter. Allowed formats for the filter values are a list of column names, a list of column indices, or a Selector object. See the end of the “Examples” section below for details.

verboseint, default = 1

Whether to print progress information while the report is being generated.

  • verbose = 1 prints how many columns have been processed so far.

  • verbose = 0 silences the output.

plot_distributionsbool or “auto”, default=”auto”

Whether to plot the distributions of the columns.

  • True: always generate plots, regardless of column count.

  • False: never generate plots.

  • "auto" (default): generate plots only when the number of columns does not exceed the configured table_report_plots_threshold (see set_config()).

compute_associationsbool or “auto”, default=”auto”

Whether to compute associations between columns.

  • True: always compute associations, regardless of column count.

  • False: never compute associations.

  • "auto" (default): compute associations only when the number of columns does not exceed the configured table_report_associations_threshold (see set_config()).

max_plot_columnsint or “all”, deprecated

Deprecated in favor of plot_distributions. This parameter overrides the value chosen for plot_distributions when it is not None.

Deprecated since version 0.9.0.

max_association_columnsint or “all”, deprecated

Deprecated in favor of compute_associations. This parameter overrides the value chosen for compute_associations when it is not None.

Deprecated since version 0.9.0.

open_tabstr, default=”table”

The tab that will be displayed by default when the report is opened. Must be one of “table”, “stats”, “distributions”, or “associations”.

  • “table”: Shows a sample of the dataframe rows

  • “stats”: Shows summary statistics for all columns

  • “distributions”: Shows plots of column distributions

  • “associations”: Shows column associations and similarities

See also

patch_display

Replace the default DataFrame HTML displays in the output of notebook cells with a TableReport.

Notes

You can see some example reports for a few datasets online. We also provide an experimental online demo that allows you to select a CSV or parquet file and generate a report directly in your web browser.

Examples

>>> import pandas as pd
>>> from skrub import TableReport
>>> df = pd.DataFrame(dict(a=[1, 2], b=['one', 'two'], c=[11.1, 11.1]))
>>> report = TableReport(df)

If you are in a Jupyter notebook, to display the report just have it be the last expression evaluated in a cell so that it is displayed in the cell’s output.

>>> report
<TableReport: use .open() to display>

(Note that above we only see the string representation, not the report itself, because we are not in a notebook.)

Whether you are using a notebook or not, you can always open the report as a full page in a separate browser tab with its open method: report.open().

You can also get the HTML report as a string. For a full, standalone web page:

>>> report.html()
'<!DOCTYPE html>\n<html lang="en-US">\n\n<head>\n    <meta charset="utf-8"...'

For an HTML fragment that can be inserted into a page:

>>> report.html_snippet()
'\n<div id="report_...-wrapper" hidden>\n    <template id="report_...'

Advanced configuration: you can add custom column filters that will appear in the report’s dropdown menu.

>>> filters = {
...         "display_name": ["a", "b"],
... }
>>> report = TableReport(df, column_filters=filters)

With the code above, in addition to the default filters such as “All columns”, “Numeric columns”, etc., the added “Columns with at least 2 unique values” will be available in the report, selecting columns “a” and “b”.

Methods

html()

Get the report as a full HTML page.

html_snippet()

Get the report as an HTML fragment that can be inserted in a page.

json()

Get the report data in JSON format.

open()

Open the HTML report in a web browser.

write_html(file)

Store the report into an HTML file.

html()[source]#

Get the report as a full HTML page.

Returns:
str

The HTML page.

html_snippet()[source]#

Get the report as an HTML fragment that can be inserted in a page.

Returns:
str

The HTML snippet.

json()[source]#

Get the report data in JSON format.

Returns:
str

The JSON data.

open()[source]#

Open the HTML report in a web browser.

write_html(file)[source]#

Store the report into an HTML file.

Parameters:
filestr, pathlib.Path or file object

The file object or path of the file to store the HTML output.