Install#


pip install skrub -U

Deep learning dependencies

Deep-learning based encoders like TextEncoder require installing optional dependencies to use them. The following will install torch, transformers, and sentence-transformers.

$ pip install skrub[transformers] -U

conda install -c conda-forge skrub

Deep learning dependencies

Deep-learning based encoders like TextEncoder require installing optional dependencies to use them. The following will install torch, transformers, and sentence-transformers.

$ conda install -c conda-forge skrub[transformers]

mamba install -c conda-forge skrub

Deep learning dependencies

Deep-learning based encoders like TextEncoder require installing optional dependencies to use them. The following will install torch, transformers, and sentence-transformers.

$ mamba install -c conda-forge skrub[transformers]

Advanced Usage for Contributors#

1. Fork the project#

To contribute to the project, you first need to fork skrub on GitHub.

That will enable you to push your commits to a branch on your fork.

2. Clone your fork#

Clone your forked repo to your local machine:

git clone https://github.com/<YOUR_USERNAME>/skrub
cd skrub

Next, add the upstream remote (i.e. the official skrub repository). This allows you to pull the latest changes from the main repository:

git remote add upstream https://github.com/skrub-data/skrub.git

Verify that both the origin (your fork) and upstream (official repo) are correctly set up:

git remote -v

You should see something like this:

origin  git@github.com:<YOUR_USERNAME>/skrub.git (fetch)
origin  git@github.com:<YOUR_USERNAME>/skrub.git (push)
upstream        git@github.com:skrub-data/skrub.git (fetch)
upstream        git@github.com:skrub-data/skrub.git (push)

3. Setup your environment#

Now, setup a development environment. You can set up a virtual environment with Conda, or with python’s venv:

conda create -n env_skrub python=3.13
conda activate env_skrub
python -m venv env_skrub
source env_skrub/bin/activate

Then, with the environment activated and at the root of your local copy of skrub, install the local package in editable mode with development dependencies:

pip install -e ".[dev, lint, test, doc]"

Enabling pre-commit hooks ensures code style consistency by triggering checks (mainly formatting) every time you run a git commit.

pre-commit install

Optionally, configure Git to ignore certain revisions in git blame and IDE integrations. These revisions are listed in .git-blame-ignore-revs:

git config blame.ignoreRevsFile .git-blame-ignore-revs

4. Run the tests#

To ensure your environment is correctly set up, run the test suite:

pytest --pyargs skrub

Testing should take about 5 minutes.

If you see some warnings like: .. code:: sh

UserWarning: Only pandas and polars DataFrames are supported, but input is a Numpy array. Please convert Numpy arrays to DataFrames before passing them to skrub transformers. Converting to pandas DataFrame with columns [‘0’, ‘1’, …].

warnings.warn(

This is expected, and you may proceed with the next steps without worrying about them. However, no tests should fail at this point: if they do fail, then let us know.

After that, your environment is ready for development!

Deep learning dependencies

Deep-learning based encoders like TextEncoder require installing optional dependencies to use them. The following will install torch, transformers, and sentence-transformers.

$ pip install -e ".[transformers]"

Now that you’re set up, you may return to writing your first pull request and start coding!