#############
API reference
#############
.. raw:: html
This page lists all available functions and classes of `skrub`.
.. currentmodule:: skrub
.. raw:: html
Joining tables
.. autosummary::
:toctree: generated/
:template: function.rst
:nosignatures:
:caption: Joining tables
fuzzy_join
.. autosummary::
:toctree: generated/
:template: class.rst
:nosignatures:
Joiner
AggJoiner
AggTarget
.. autosummary::
:toctree: generated/
:template: class.rst
:nosignatures:
InterpolationJoiner
.. raw:: html
Column selection in a pipeline
.. autosummary::
:toctree: generated/
:template: class.rst
:nosignatures:
:caption: Column selection in a pipeline
SelectCols
DropCols
.. raw:: html
Vectorizing a dataframe
.. autosummary::
:toctree: generated/
:template: class.rst
:nosignatures:
:caption: Vectorizing a dataframe
TableVectorizer
.. raw:: html
Dirty category encoders
.. autosummary::
:toctree: generated/
:template: class.rst
:nosignatures:
:caption: Dirty category encoders
GapEncoder
MinHashEncoder
SimilarityEncoder
.. raw:: html
Dealing with dates
.. autosummary::
:toctree: generated/
:template: class.rst
:nosignatures:
:caption: Other encoders
DatetimeEncoder
.. autosummary::
:toctree: generated/
:template: function.rst
:nosignatures:
:caption: Converting datetime columns in a table
to_datetime
.. raw:: html
Deduplication: merging variants of the same entry
.. autosummary::
:toctree: generated/
:template: function.rst
:nosignatures:
:caption: Deduplication: merging variants of the same entry
deduplicate
.. raw:: html
Data download and generation
.. autosummary::
:toctree: generated/
:template: function.rst
:nosignatures:
:caption: Data download and generation
datasets.fetch_employee_salaries
datasets.fetch_medical_charge
datasets.fetch_midwest_survey
datasets.fetch_open_payments
datasets.fetch_road_safety
datasets.fetch_traffic_violations
datasets.fetch_drug_directory
datasets.fetch_world_bank_indicator
datasets.fetch_movielens
datasets.fetch_ken_table_aliases
datasets.fetch_ken_types
datasets.fetch_ken_embeddings
datasets.make_deduplication_data