=============================== Vision: where is skrub heading? =============================== .. currentmodule:: skrub Vision statement ================ The goal of skrub is to facilitate machine learning on tables: `pandas `__ and `polars `__ dataframes, SQL databases... | Skrub is high-level, with a philosophy and an API matching that of `scikit-learn `_. It strives to bridge the world of databases to that of machine-learning, **enabling imperfect assembly and representations of the data when it is noisy**, using the downstream target to predict to guide assembly when possible (supervised learning for data assembly). In the long term, as skrub is built on higher-level APIs, it will make it easier for data-scientists to use efficient database patterns and backends. Skrub seeks tradeoffs in terms of flexibility: its high-level APIs are by construction restrictive compared to directly manipulating dataframes. This is by design, as skrub does not aim to replace tools such as `Pandas `__, `Ibis `__, `DuckDB `_. To make things simpler, skrub uses defaults that are chosen empirically to give good machine learning, even though these are sometimes heuristic, as in the :class:`TableVectorizer`. Roadmap ======= In an open-source project, roadmaps can be whishful thinking: things happen in an iterative way, often guided by the community. We however decided to communicate on what we would like to do in the next 6 months to give a better idea of the vision. From shorter term to longer term: - Better support for time series - Data namespaces, lazy data loading, out of core computing using database engines (e.g., duckdb) - Join discovery to work in data lakes where the tables are not in a clean relational database - Automatic feature synthesis in databases, building on the assembling features