Exploring dataframes interactively with the TableReport#
The TableReport gives a high-level overview of a Dataframe or Series, suitable for
quick exploratory analysis. The report shows the first
and last 5 rows of the dataframe (decided by the n_rows parameter), as well
as additional information in other tabs.
The Stats tab reports high-level statistics for each column.
The Distribution tab collects summary plots for each column (max 30 by default).
The Associations tab shows Cramer V and Pearson correlation between columns.
Built-in filters allow selection of columns by dtype and other conditions.
The TableReport of a table can be generated as follows:
>>> from skrub import TableReport
>>> import pandas as pd
>>> df = pd.DataFrame({
... "id": [1, 2, 3],
... "value": [10, 20, 30],
... })
>>> TableReport(df) # from a notebook cell
<TableReport: use .open() to display>
The command TableReport(df).open() opens the report in a browser window.
A demo of the TableReport#
Pre-computed examples of the TableReport are available
here, and you can
try it out on your data here.
In the Distributions tab, it is possible to select columns by clicking on the checkmark icon: the name of the column is added to the bar on top, so that it may be copied in a script.
The TableReport can be used in a notebook cell, or it can be opened in a browser
window using TableReport(df).open().