TableReport#
- class skrub.TableReport(dataframe, n_rows=10, order_by=None, title=None, column_filters=None, verbose=1)[source]#
Summarize the contents of a dataframe.
This class summarizes a dataframe, providing information such as the type and summary statistics (mean, number of missing values, etc.) for each column.
- Parameters:
- dataframepandas or polars DataFrame
The dataframe to summarize.
- n_rows
int
, default=10 Maximum number of rows to show in the sample table. Half will be taken from the beginning (head) of the dataframe and half from the end (tail). Note this is only for display. Summary statistics, histograms etc. are computed using the whole dataframe.
- order_by
str
Column name to use for sorting. Other numerical columns will be plotted as function of the sorting column. Must be of numerical or datetime type.
- title
str
Title for the report.
- column_filters
dict
A dict for adding custom entries to the column filter dropdown menu. Each key is an id for the filter (e.g.
"first_10"
) and the value is a mapping with the keysdisplay_name
(the name shown in the menu, e.g."First 10 columns"
) andcolumns
(a list of column names). See the end of the “Examples” section below for details.- verbose
int
, default = 1 Whether to print progress information while the report is being generated.
verbose = 1 prints how many columns have been processed so far.
verbose = 0 silences the output.
See also
patch_display
Replace the default DataFrame HTML displays in the output of notebook cells with a TableReport.
Notes
You can see some example reports for a few datasets online. We also provide an experimental online demo that allows you to select a CSV or parquet file and generate a report directly in your web browser.
Examples
>>> import pandas as pd >>> from skrub import TableReport >>> df = pd.DataFrame(dict(a=[1, 2], b=['one', 'two'], c=[11.1, 11.1])) >>> report = TableReport(df)
If you are in a Jupyter notebook, to display the report just have it be the last expression evaluated in a cell so that it is displayed in the cell’s output.
>>> report <TableReport: use .open() to display>
(Note that above we only see the string representation, not the report itself, because we are not in a notebook.)
Whether you are using a notebook or not, you can always open the report as a full page in a separate browser tab with its
open
method:report.open()
.You can also get the HTML report as a string. For a full, standalone web page:
>>> report.html() Processing... '<!DOCTYPE html>\n<html lang="en-US">\n\n<head>\n <meta charset="utf-8"...'
For an HTML fragment that can be inserted into a page:
>>> report.html_snippet() '\n<div id="report_...-wrapper" hidden>\n <template id="report_...'
Advanced configuration: you can add custom column filters that will appear in the report’s dropdown menu.
>>> filters = { ... "at_least_2": { ... "display_name": "Columns with at least 2 unique values", ... "columns": ["a", "b"], ... } ... } >>> report = TableReport(df, column_filters=filters)
With the code above, in addition to the default filters such as “All columns”, “Numeric columns”, etc., the added “Columns with at least 2 unique values” will be available in the report, selecting columns “a” and “b”.
Methods
html
()Get the report as a full HTML page.
Get the report as an HTML fragment that can be inserted in a page.
json
()Get the report data in JSON format.
open
()Open the HTML report in a web browser.
Gallery examples#
Various string encoders: a sentiment analysis example
AggJoiner on a credit fraud dataset