Exploring dataframes interactively with the TableReport#

The TableReport gives a high-level overview of a Dataframe or Series, suitable for quick exploratory analysis. The report shows the first and last 5 rows of the dataframe (decided by the n_rows parameter), as well as additional information in other tabs.

  • The Stats tab reports high-level statistics for each column.

  • The Distribution tab collects summary plots for each column (max 30 by default).

  • The Associations tab shows Cramer V and Pearson correlation between columns.

  • Built-in filters allow selection of columns by dtype and other conditions.

The TableReport of a table can be generated as follows:

>>> from skrub import TableReport
>>> import pandas as pd
>>> df = pd.DataFrame({
...     "id": [1, 2, 3],
...     "value": [10, 20, 30],
... })
>>> TableReport(df)  # from a notebook cell
<TableReport: use .open() to display>

The command TableReport(df).open() opens the report in a browser window.

A demo of the TableReport#

Pre-computed examples of the TableReport are available here, and you can try it out on your data here.

In the Distributions tab, it is possible to select columns by clicking on the checkmark icon: the name of the column is added to the bar on top, so that it may be copied in a script.

The TableReport can be used in a notebook cell, or it can be opened in a browser window using TableReport(df).open().

Altering the Appearance of the TableReport#

The skrub global configuration includes various parameters that allow to tweak the HTML representation of the TableReport.

For performance reasons, the TableReport disables the computation of distributions and associations for tables with more than 30 columns. This behavior can be changed by modifying the max_plot_columns and max_association_columns parameter.

It is also possible to specify the floating point precision by setting the appropriate float_precision parameter.

Parameters can be made permanent in a script by altering the configuration with set_config(), or by setting the respective environment variables. Refer to Customizing the global configuration for more detail.

Exporting and Sharing the TableReport#

The TableReport is a standalone object that does not require a running notebook to be accessed after generation: it can be exported in HTML format and opened directly in a browser as a HTML page.

>>> import io # to avoid writing to disk in the example
>>> from skrub import TableReport
>>> import pandas as pd
>>> df = pd.DataFrame({
...     "id": [1, 2, 3],
...     "value": [10, 20, 30],
... })
>>> tr = TableReport(df)
>>> html_buffer = io.StringIO()
>>> tr.write_html(html_buffer)  # save to file
>>> html = tr.html()  # get a string containing the HTML for a full page
>>> html_snippet = tr.html_snippet()  # get an HTML fragment to embed in a page
>>> tr_json = tr.json()  # get the content of the report in JSON format