Skip to main content
Ctrl+K
skrub - Home skrub - Home
  • Install
  • User guide
  • API Reference
  • Examples
    • Learning Materials
    • Release history
    • Development
    • Contributing to skrub
  • GitHub
  • Discord
  • Bluesky
  • X (ex-Twitter)
  • Install
  • User guide
  • API Reference
  • Examples
  • Learning Materials
  • Release history
  • Development
  • Contributing to skrub
  • GitHub
  • Discord
  • Bluesky
  • X (ex-Twitter)

Section Navigation

  • Getting Started
  • Encoding: from a dataframe to a numerical matrix for machine learning
  • Various string encoders: a sentiment analysis example
  • Handling datetime features with the DatetimeEncoder
  • Fuzzy joining dirty tables with the Joiner
  • Deduplicating misspelled categories
  • Wikipedia embeddings to enrich the data
  • Spatial join for flight data: Joining across multiple columns
  • AggJoiner on a credit fraud dataset
  • Interpolation join: infer missing rows when joining two tables
  • Hands-On with Column Selection and Transformers
  • Skrub DataOps
    • Building a predictive model by combining multiple tables with the skrub DataOps
    • Tuning DataOps
    • Subsampling for faster development
  • Examples
  • Skrub DataOps

Skrub DataOps#

Building a predictive model by combining multiple tables with the skrub DataOps

Building a predictive model by combining multiple tables with the skrub DataOps

Tuning DataOps

A machine-learning pipeline typically contains some values or choices which

Subsampling for faster development

Here we show how to use DataOp.skb.subsample to speed-up

previous

Hands-On with Column Selection and Transformers

next

Building a predictive model by combining multiple tables with the skrub DataOps

This Page

  • Show Source

© Copyright 2018-2023, the dirty_cat developers, 2023-2025, the skrub developers.

Created using Sphinx 8.2.3.

Built with the PyData Sphinx Theme 0.16.1.