Inria Academy - Skrub like a pro
1 Introduction to the course
This is the website for the Inria Academy course on the skrub package: it contains all the material used for the course, including the datasets and exercises used during the session.
If you are reading this, then you will be attending the Beta version of this course. As a Beta version, this is not the final version of the course and it will be tweaked according to the feedback provided after the session.
Both the presentation and the content of the book may be changed based on feedback.
1.1 Structure of the course
The course covers the main features of skrub, from data exploration to pipeline construction, with the notable exclusion of the Data Ops.
Each chapter includes a section that describes how a specific feature may assist in building a machine learning pipeline, along with practical code examples.
Some chapters include exercises for participants to work with the explained features. These exercises are made available in content/exercises, as well as at the end of the respective lesson in content/notebooks.
The content of the book is split in sections, and each section includes a “final quiz” that covers the subjects covered up to that point.
2 Prepration and setup
First of all, clone the GitHub repo of this book to have access to the exercises. In a future version, Jupyterlite support will be added.
2.1 Setting up a local environment
Depending on how you launch the instance of Jupyter lab, you might start it in the root folder.
All notebooks used in the course are found in content/notebooks, while the exercises are in content/exercises.
2.1.1 Using pixi
The easiest way to set up the environment is by installing and using pixi. Follow the platform-specific instructions in the link to install pixi, then open a terminal window in the folder of the repository you cloned.
Run
pixi installto create the environment, followed by
pixi run labto start a Jupyter lab instance.
2.1.2 Using pip
Create the and activate the environment:
python -m venv skrub-tutorial
source skrub-tutorial/bin/activateInstall the required dependencies using the requirements.txt file:
pip install -r requirements.txtStart the Jupyter lab instance:
jupyter lab2.1.3 Using conda
An environment.yaml file is provided to create a conda environment.
Create and activate the environment with
conda env create -f environment.yaml
conda activate skrub-tutorialThen, start a jupyter lab instance:
jupyter lab2.1.4 Using uv
Create the environment using pyproject.toml as the requirement file.
uv venv
uv pip install -r pyproject.tomlActivate the environment that was created in the folder.
source .venv/bin/activateStart the Jupyter lab instance:
jupyter lab