Skip to main content
Ctrl+K
skrub - Home skrub - Home
  • Install
  • User Guide
  • API Reference
  • Examples
    • Learning Materials
    • Release history
    • Development
    • Contributing to skrub
  • GitHub
  • Discord
  • Bluesky
  • X (ex-Twitter)
  • Install
  • User Guide
  • API Reference
  • Examples
  • Learning Materials
  • Release history
  • Development
  • Contributing to skrub
  • GitHub
  • Discord
  • Bluesky
  • X (ex-Twitter)

Section Navigation

  • Exploring a Dataframe
    • Exploring dataframes interactively with the TableReport
    • Finding Correlated Columns in a DataFrame
  • Wrangling data with good defaults
    • Cleaner: sanitizing a dataframe
    • Transforming a table into an array of numeric features: TableVectorizer
    • Building robust ML baselines with tabular_pipeline()
  • Column-level feature extraction
    • Encoding string and text columns as numeric features
    • Handling datetimes: parsing from strings and encoding as numbers
    • Robust scaling of numeric features using SquashingScaler
  • Multi-column operations
    • Operating over multiple columns at once
    • Removing unneeded columns with DropUninformative and Cleaner
    • Skrub Selectors: helpers for selecting columns in a dataframe
    • Selecting based on dtype or data properties
    • Advanced selectors: filter() and filter_names()
  • Complex multi-table pipelines with Data Ops
    • Basics of DataOps: the DataOps plan, variables, and learners
    • Building a simple DataOps plan
    • Using previews for easier development and debugging
    • DataOps allow direct access to methods of the underlying data
    • Control flow in DataOps: eager and deferred evaluation
    • How do skrub Data Ops differ from the alternatives?
    • Applying machine-learning estimators
    • Applying different transformers using skrub selectors and DataOps
    • Documenting the DataOps plan with node names and descriptions
    • Evaluating and debugging the DataOps plan with .skb.full_report()
    • Using only a part of a DataOps plan
    • Subsampling data for easier development and debugging
    • Tuning and validating skrub DataOps plans
    • Using the skrub choose_* functions to tune hyperparameters
    • Validating hyperparameter search with nested cross-validation
    • Going beyond estimator hyperparameters: nesting choices and choosing pipelines
    • Exporting the DataOps plan as a learner and reusing it
  • Configuration and dataset utilities
    • Customizing the global configuration
    • Deduplicating categorical data with deduplicate()
    • Working with the example datasets provided by skrub
  • Joining Dataframes
    • Assembling: joining multiple tables
  • User Guide
  • Exploring a Dataframe

Exploring a Dataframe#

This section covers the TableReport and how it can be used for exploring and understanding your dataframes.

  • Exploring dataframes interactively with the TableReport
    • A demo of the TableReport
      • Altering the Appearance of the TableReport
      • Exporting and Sharing the TableReport
  • Finding Correlated Columns in a DataFrame

previous

User Guide

next

Exploring dataframes interactively with the TableReport

This Page

  • Show Source

© Copyright 2018-2023, the dirty_cat developers, 2023-2025, the skrub developers.

Created using Sphinx 8.2.3.

Built with the PyData Sphinx Theme 0.16.1.