TableReport#

class skrub.TableReport(dataframe, n_rows=10, order_by=None, title=None, column_filters=None, verbose=1)[source]#

Summarize the contents of a dataframe.

This class summarizes a dataframe, providing information such as the type and summary statistics (mean, number of missing values, etc.) for each column.

Parameters:
dataframepandas or polars DataFrame

The dataframe to summarize.

n_rowsint, default=10

Maximum number of rows to show in the sample table. Half will be taken from the beginning (head) of the dataframe and half from the end (tail). Note this is only for display. Summary statistics, histograms etc. are computed using the whole dataframe.

order_bystr

Column name to use for sorting. Other numerical columns will be plotted as function of the sorting column. Must be of numerical or datetime type.

titlestr

Title for the report.

column_filtersdict

A dict for adding custom entries to the column filter dropdown menu. Each key is an id for the filter (e.g. "first_10") and the value is a mapping with the keys display_name (the name shown in the menu, e.g. "First 10 columns") and columns (a list of column names). See the end of the “Examples” section below for details.

verboseint, default = 1

Whether to print progress information while the report is being generated.

  • verbose = 1 prints how many columns have been processed so far.

  • verbose = 0 silences the output.

See also

patch_display

Replace the default DataFrame HTML displays in the output of notebook cells with a TableReport.

Notes

You can see some example reports for a few datasets online. We also provide an experimental online demo that allows you to select a CSV or parquet file and generate a report directly in your web browser.

Examples

>>> import pandas as pd
>>> from skrub import TableReport
>>> df = pd.DataFrame(dict(a=[1, 2], b=['one', 'two'], c=[11.1, 11.1]))
>>> report = TableReport(df)

If you are in a Jupyter notebook, to display the report just have it be the last expression evaluated in a cell so that it is displayed in the cell’s output.

>>> report
<TableReport: use .open() to display>

(Note that above we only see the string representation, not the report itself, because we are not in a notebook.)

Whether you are using a notebook or not, you can always open the report as a full page in a separate browser tab with its open method: report.open().

You can also get the HTML report as a string. For a full, standalone web page:

>>> report.html()
Processing...
'<!DOCTYPE html>\n<html lang="en-US">\n\n<head>\n    <meta charset="utf-8"...'

For an HTML fragment that can be inserted into a page:

>>> report.html_snippet()
'\n<div id="report_...-wrapper" hidden>\n    <template id="report_...'

Advanced configuration: you can add custom column filters that will appear in the report’s dropdown menu.

>>> filters = {
...     "at_least_2": {
...         "display_name": "Columns with at least 2 unique values",
...         "columns": ["a", "b"],
...     }
... }
>>> report = TableReport(df, column_filters=filters)

With the code above, in addition to the default filters such as “All columns”, “Numeric columns”, etc., the added “Columns with at least 2 unique values” will be available in the report, selecting columns “a” and “b”.

Methods

html()

Get the report as a full HTML page.

html_snippet()

Get the report as an HTML fragment that can be inserted in a page.

json()

Get the report data in JSON format.

open()

Open the HTML report in a web browser.

html()[source]#

Get the report as a full HTML page.

Returns:
str

The HTML page.

html_snippet()[source]#

Get the report as an HTML fragment that can be inserted in a page.

Returns:
str

The HTML snippet.

json()[source]#

Get the report data in JSON format.

Returns:
str

The JSON data.

open()[source]#

Open the HTML report in a web browser.