gt4sd.algorithms.conditional_generation.key_bert.implementation module

Implementation of the KeyBERT keyword extractor.

Summary

Classes:

KeyBERT

Keyword extractor based on [KeyBERT](https://github.com/MaartenGr/KeyBERT).

Reference

class KeyBERT(resources_path, minimum_keyphrase_ngram, maximum_keyphrase_ngram, stop_words, top_n, use_maxsum, use_mmr, diversity, number_of_candidates, model_name, device=None)[source]

Bases: object

Keyword extractor based on [KeyBERT](https://github.com/MaartenGr/KeyBERT).

__init__(resources_path, minimum_keyphrase_ngram, maximum_keyphrase_ngram, stop_words, top_n, use_maxsum, use_mmr, diversity, number_of_candidates, model_name, device=None)[source]

Initialize KeyBERT.

Parameters
  • resources_path (str) – path where to load hypothesis, candidate labels and, optionally, the model.

  • minimum_keyphrase_ngram (int) – lower bound for phrase size.

  • maximum_keyphrase_ngram (int) – upper bound for phrase size.

  • stop_words (Optional[str, None]) – language for the stop words removal. If not provided, no stop words removal.

  • top_n (int) – number of keywords to extract.

  • use_maxsum (bool) – control usage of max sum similarity for keywords generated.

  • use_mmr (bool) – control usage of max marginal relevance for keywords generated.

  • diversity (float) – diversity for the results when enabling use_mmr.

  • number_of_candidates (int) – candidates considered when enabling use_maxsum.

  • model_name (str) – name of the model to load from the cache or download from SentenceTransformers.

  • device (Union[device, str, None]) – device where the inference is running either as a dedicated class or a string. If not provided is inferred.

load_model()[source]

Load KeyBERT model.

Return type

None

predict(text)[source]

Get keywords sorted by relevance.

Parameters

text (str) – text to extract keywords from.

Return type

List[str]

Returns

keywords sorted by score from highest to lowest.

__dict__ = mappingproxy({'__module__': 'gt4sd.algorithms.conditional_generation.key_bert.implementation', '__doc__': '\n    Keyword extractor based on [KeyBERT](https://github.com/MaartenGr/KeyBERT).\n    ', '__init__': <function KeyBERT.__init__>, 'load_model': <function KeyBERT.load_model>, 'predict': <function KeyBERT.predict>, '__dict__': <attribute '__dict__' of 'KeyBERT' objects>, '__weakref__': <attribute '__weakref__' of 'KeyBERT' objects>, '__annotations__': {}})
__doc__ = '\n    Keyword extractor based on [KeyBERT](https://github.com/MaartenGr/KeyBERT).\n    '
__module__ = 'gt4sd.algorithms.conditional_generation.key_bert.implementation'
__weakref__

list of weak references to the object (if defined)