Install#
pip install skrub -U
Deep learning dependencies
Deep-learning based encoders like TextEncoder
require installing optional
dependencies to use them. The following will install
torch,
transformers,
and sentence-transformers.
$ pip install skrub[transformers] -U
conda install -c conda-forge skrub
Deep learning dependencies
Deep-learning based encoders like TextEncoder
require installing optional
dependencies to use them. The following will install
torch,
transformers,
and sentence-transformers.
$ conda install -c conda-forge skrub[transformers]
mamba install -c conda-forge skrub
Deep learning dependencies
Deep-learning based encoders like TextEncoder
require installing optional
dependencies to use them. The following will install
torch,
transformers,
and sentence-transformers.
$ mamba install -c conda-forge skrub[transformers]
Advanced Usage for Contributors#
1. Fork the project#
To contribute to the project, you first need to fork skrub on GitHub.
That will enable you to push your commits to a branch on your fork.
2. Clone your fork#
Clone your forked repo to your local machine:
git clone https://github.com/<YOUR_USERNAME>/skrub
cd skrub
Next, add the upstream remote (i.e. the official skrub repository). This allows you to pull the latest changes from the main repository:
git remote add upstream https://github.com/skrub-data/skrub.git
Verify that both the origin (your fork) and upstream (official repo) are correctly set up:
git remote -v
You should see something like this:
origin git@github.com:<YOUR_USERNAME>/skrub.git (fetch)
origin git@github.com:<YOUR_USERNAME>/skrub.git (push)
upstream git@github.com:skrub-data/skrub.git (fetch)
upstream git@github.com:skrub-data/skrub.git (push)
3. Setup your environment#
Now, setup a development environment.
You can set up a virtual environment with Conda, or with python’s venv
:
With conda:
conda create -n env_skrub python=3.13
conda activate env_skrub
With venv:
python -m venv env_skrub
source env_skrub/bin/activate
Then, with the environment activated and at the root of your local copy of skrub, install the local package in editable mode with development dependencies:
pip install -e ".[dev, lint, test, doc]"
Enabling pre-commit hooks ensures code style consistency by triggering checks (mainly formatting) every time you run a git commit
.
pre-commit install
Optionally, configure Git to ignore certain revisions in git blame and IDE integrations. These revisions are listed in .git-blame-ignore-revs:
git config blame.ignoreRevsFile .git-blame-ignore-revs
4. Run the tests#
To ensure your environment is correctly set up, run the test suite:
pytest --pyargs skrub
Testing should take about 5 minutes.
If you see some warnings like: .. code:: sh
- UserWarning: Only pandas and polars DataFrames are supported, but input is a Numpy array. Please convert Numpy arrays to DataFrames before passing them to skrub transformers. Converting to pandas DataFrame with columns [‘0’, ‘1’, …].
warnings.warn(
This is expected, and you may proceed with the next steps without worrying about them. However, no tests should fail at this point: if they do fail, then let us know.
After that, your environment is ready for development!
Deep learning dependencies
Deep-learning based encoders like TextEncoder
require installing optional
dependencies to use them. The following will install
torch,
transformers,
and sentence-transformers.
$ pip install -e ".[transformers]"
Now that you’re set up, you may return to writing your first pull request and start coding!