Exploring dataframes with the TableReport
#
Usage#
>>> from skrub import TableReport
>>> import pandas as pd
>>> df = pd.DataFrame({
... "id": [1, 2, 3],
... "value": [10, 20, 30],
... })
>>> TableReport(df) # from a notebook cell
<TableReport: use .open() to display>
The command TableReport(df).open()
opens the report in a browser window.
A demo of the TableReport
Pre-computed examples of of the TableReport
are available
here, and you can
try it out on your data here.
The TableReport
gives a high-level overview of the given dataframe, suitable for
quick exploratory analysis of series and dataframes. The report shows the first
and last 5 rows of the dataframe (decided by the n_rows
parameter), as well
as additional information in other tabs.
The Stats tab reports high-level statistics for each column.
The Distribution tab collects summary plots for each column (max 30 by default).
The Associations tab shows Cramer V and Pearson correlation between columns.
Built-in filters allow selection of columns by dtype and other conditions.
In the Distributions tab, it is possible to select columns by clicking on the checkmark icon: the name of the column is added to the bar on top, so that it may be copied in a script.
The TableReport can be used in a notebook cell, or it can be opened in a browser
window using TableReport(df).open()
.
Altering the Appearance of the TableReport
#
For performance reasons, the TableReport
disables the computation of
distributions and associations for tables with more than 30 columns. This behavior
can be changed by modifying the max_plot_columns
and max_association_columns
parameter, or by altering the configuration with set_config()
(refer to the
TableReport
and set_config()
docs for more detail).
More pre-computed examples are available here.
Exporting and Sharing the TableReport
#
The TableReport
is a standalone object that does not require a running notebook
to be accessed after generation: it can be exported in HTML format and opened
directly in a browser as a HTML page.
>>> import io # to avoid writing to disk
>>> tr = TableReport(df)
>>> html_buffer = io.StringIO()
>>> tr.write_html(html_buffer) # save to file
>>> html = tr.html() # get a string containing the HTML for a full page
>>> html_snippet = tr.html_snippet() # get an HTML fragment to embed in a page
>>> tr_json = tr.json() # get the content of the report in JSON format