fetch_ken_types#
- skrub.datasets.fetch_ken_types(search=None, *, exclude=None, embedding_table_id='all_entities')[source]#
Helper function to search for KEN entity types.
The result can then be used with fetch_ken_embeddings.
- Parameters:
- search
str
, optional Substring pattern that filters the types of entities.
- exclude
str
, optional Substring pattern to exclude from the search.
- embedding_table_id
str
, default=’all_entities’ Table of embedded entities from which to extract the embeddings. Get the supported tables with fetch_ken_table_aliases. It is NOT possible to pass a custom figshare ID.
- search
- Returns:
DataFrame
The types of entities containing the substring.
See also
fetch_ken_embeddings
Download Wikipedia embeddings by type.
Notes
Best used in conjunction with fetch_ken_embeddings.
This function requires pyarrow to be installed.
References
For more details, see Cvetkov-Iliev, A., Allauzen, A. & Varoquaux, G.: Relational data embeddings for feature enrichment with background information.
Examples
To get all the existing KEN types of entities:
>>> embedding_types = fetch_ken_types() >>> embedding_types.head() Type 0 wikicat_italian_male_screenwriters 1 wikicat_21st-century_roman_catholic_archbishop... 2 wikicat_2000s_romantic_drama_films 3 wikicat_music_festivals_in_france 4 wikicat_20th-century_american_women_artists
Let’s search for all KEN types with the strings “dance” or “music”:
>>> embedding_filtered_types = fetch_ken_types(search="dance|music") >>> embedding_filtered_types.head() Type 0 wikicat_music_festivals_in_france 1 wikicat_films_scored_by_bharadwaj_(music_direc... 2 wikicat_english_music_journalists 3 wikicat_20th-century_american_male_musicians 4 wikicat_alumni_of_the_london_academy_of_music_...
Gallery examples#
Wikipedia embeddings to enrich the data