Exploring dataframes with the TableReport#

Usage#

>>> from skrub import TableReport
>>> import pandas as pd
>>> df = pd.DataFrame({
...     "id": [1, 2, 3],
...     "value": [10, 20, 30],
... })
>>> TableReport(df)  # from a notebook cell
<TableReport: use .open() to display>

The command TableReport(df).open() opens the report in a browser window.

A demo of the TableReport Pre-computed examples of of the TableReport are available here, and you can try it out on your data here.

The TableReport gives a high-level overview of the given dataframe, suitable for quick exploratory analysis of series and dataframes. The report shows the first and last 5 rows of the dataframe (decided by the n_rows parameter), as well as additional information in other tabs.

  • The Stats tab reports high-level statistics for each column.

  • The Distribution tab collects summary plots for each column (max 30 by default).

  • The Associations tab shows Cramer V and Pearson correlation between columns.

  • Built-in filters allow selection of columns by dtype and other conditions.

In the Distributions tab, it is possible to select columns by clicking on the checkmark icon: the name of the column is added to the bar on top, so that it may be copied in a script.

The TableReport can be used in a notebook cell, or it can be opened in a browser window using TableReport(df).open().

Altering the Appearance of the TableReport#

For performance reasons, the TableReport disables the computation of distributions and associations for tables with more than 30 columns. This behavior can be changed by modifying the max_plot_columns and max_association_columns parameter, or by altering the configuration with set_config() (refer to the TableReport and set_config() docs for more detail).

More pre-computed examples are available here.

Exporting and Sharing the TableReport#

The TableReport is a standalone object that does not require a running notebook to be accessed after generation: it can be exported in HTML format and opened directly in a browser as a HTML page.

>>> import io # to avoid writing to disk
>>> tr = TableReport(df)
>>> html_buffer = io.StringIO()
>>> tr.write_html(html_buffer)  # save to file
>>> html = tr.html()  # get a string containing the HTML for a full page
>>> html_snippet = tr.html_snippet()  # get an HTML fragment to embed in a page
>>> tr_json = tr.json()  # get the content of the report in JSON format

Finding Correlated Columns in a DataFrame#

In addition to TableReport’s Associations tab, you can compute associations using the column_associations() function, which returns a dataframe containing the associations.