Contributing to skrub#
First off, thanks for taking the time to contribute!
The following is a set of guidelines for contributing to skrub.
We use GitHub Discussions for general chat and Q&As. Check it out!
To understand in more depth the incentives behind skrub, read our vision statement. Also, if scientific literature doesn’t scare you, we greatly encourage you to read the two following papers:
Using the library is the best way to discover new bugs and limitations.
If you find one, please check whether a similar issue already exists.
Issue is still open: leave a 👍 on the original message to let us know there are several users affected by this issue.
Issue has been closed:
By a merged pull request (1) update your skrub version, or (2) the fix has not been released yet.
Without pull request, there might be a
wontfixlabel, and/or a reason at the bottom of the conversation.
Otherwise, file a new issue.
To solve your issue, first explain the problem and include additional details to help maintainers easily reproduce the problem:
Use a clear and descriptive title which identifies the problem.
Describe the result you expected.
Add additional details to your description problem such as situations where the bug should have appeared but didn’t.
Include a snippet of code that reproduces the error, if any, as it allows maintainers to reproduce it in a matter of seconds!
Specify versions of Python, skrub, and other dependencies which might be linked to the issue (e.g., scikit-learn, numpy, pandas, etc.).
This section will guide you through submitting a new enhancement for skrub, whether it is a small fix or a new feature.
First, you should check whether the feature has not already been proposed or implemented.
If not, before writing any code, submit a new issue proposing the change.
Use a clear and descriptive title.
Provide a quick explanation of the goal of this enhancement.
Provide a step-by-step description of the suggested enhancement with as many details as possible.
If it exists elsewhere, link resources.
Let maintainers know whether:
You will write the code and submit a pull request (PR). Writing the feature yourself is the fastest way to getting it implemented in the library, and we’ll help in that process if guidance is needed! To go further, refer to the section Writing your first Pull Request.
You won’t write the code, in which case a developer can start working on it. Note however that we cannot guarantee how much time it will take to implement the change.
There are specific incentives behind skrub. While most enhancement ideas are good, they don’t always fit in the context of the library.
If you’d like to implement your idea regardless, we’d be very glad if you create a new package that builds on top of skrub! In some cases, we might even feature it on the official repository!
If not already done, first create an issue, and discuss the changes with the project’s maintainers.
See in the sections above for the right way to do this.
First, fork skrub on Github.
That will enable you to push your commits to a branch on your fork.
Then, clone the repo on your computer:
git clone https://github.com/<YOUR_NAME>/skrub
It is advised to create a new branch every time you work on a new issue, to avoid confusion:
git switch -c branch_name
Finally, install the dependencies by heading to the installation process, advanced usage section.
There are a few specific project goals to keep in mind:
Pure Python code - no binary extensions, Cython, etc.
Make production-friendly code.
Try to target the broadest range of versions (Python and dependencies).
Use the least amount of dependencies.
Make code as backward compatible as possible.
Prefer performance to readability.
Optimized code might be hard to read, so please comment it
Use explicit, borderline verbose variables / function names
Public functions / methods / variables / class signatures should be documented and type-hinted.
The public API describes the components users of the library will import and use. It’s everything that can be imported and does not start with an underscore.
After pushing your commits to your remote repository, you can use the Github “Compare & pull request” button to submit your branch code as a PR targeting the skrub repository.
Community consensus is key in the integration process. Expect a minimum of 1 to 3 reviews depending on the size of the change before we consider merging the PR.
Once again, remember that maintainers are volunteers and therefore cannot guarantee how much time it will take to review the changes.
Github Actions are used for various tasks including testing skrub on Linux, Mac and Windows, with different dependencies and settings.
CircleCI is used to build the documentation.
If any of the following markers appears in the commit message, the following actions are taken.
Commit Message Marker
Action Taken by CI
CI is skipped completely
CI is skipped completely
CI is skipped completely
CI is run with the nightly builds of dependencies
Docs are not built
Docs built, but excludes example gallery plots
Docs built including example gallery plots (longer)
Note that by default the documentation is built, but only the examples that are directly modified by the pull request are executed.
Before submitting a pull request, check if your modifications have introduced new sphinx warnings by building the documentation locally and try to fix them.
First, make sure you have properly installed the development version.
Building the documentation requires installing some additional packages:
pip install '.[doc]'
To build the documentation, you need to be in the
To generate the docs, including the example gallery, you can use:
The documentation will be generated in the
and are viewable in a web browser, for instance by opening the local
This will run all the examples, which takes a while. If you only want to generate a few examples, you can use:
EXAMPLES_PATTERN=your_regex_goes_here make html
This is particularly useful if you are modifying a few examples.