gt4sd.algorithms.generation.pgt.implementation module

Implementation details for PGT algorithms.

Summary

Classes:

CoherenceCheckGenerator

Implementation of coherence check generator.

EditGenerator

Implementation of edit generator.

Generator

Implementation of a generator.

PartGenerator

Implementation of edit generator.

Functions:

adjust_length_to_model

Adjust sequence length.

Reference

adjust_length_to_model(length, maximum_sequence_length)[source]

Adjust sequence length. :type length: int :param length: target length. :type maximum_sequence_length: int :param maximum_sequence_length: maximum sequence length.

Returns

the adjusted length.

class Generator(resources_path, model_type, model_name, max_length, top_k, top_p, num_return_sequences, prompt='This is an interesting prompt', no_repeat_ngram_size=2, device=None)[source]

Bases: object

Implementation of a generator.

__init__(resources_path, model_type, model_name, max_length, top_k, top_p, num_return_sequences, prompt='This is an interesting prompt', no_repeat_ngram_size=2, device=None)[source]

PGT generation algorithm. :type resources_path: str :param resources_path: path to the cache. :type model_type: str :param model_type: type of the model. :type model_name: str :param model_name: name of the model weights/version. :type prompt: str :param prompt: prompt for text generation. :type max_length: int :param max_length: max length of the generated text. :type top_k: int :param top_k: number of top-k probability token to keep. :type top_p: float :param top_p: only tokens with cumulative probabilities summing up to this value are kept. :type num_return_sequences: int :param num_return_sequences: number of generated sequences. :type no_repeat_ngram_size: int :param no_repeat_ngram_size: size of n-gram to not appear twice. :type device: Union[device, str, None] :param device: device where the inference

is running either as a dedicated class or a string. If not provided is inferred.

load_model()[source]

Load a pretrained PGT model.

Return type

None

generate_case()[source]

Sample text snippets.

Return type

Union[List[str], List[Tuple[str, …]]]

Returns

generated text snippets.

format_output(input_text, generated_sequences)[source]

Format output. In the general case just return the generated sequences.

Parameters
  • input_text (Union[str, Tuple[str]]) – generation input.

  • generated_sequences (List[str]) – generated sequences.

Return type

Union[List[str], List[Tuple[str, …]]]

Returns

formatted generated sequences.

__dict__ = mappingproxy({'__module__': 'gt4sd.algorithms.generation.pgt.implementation', '__doc__': 'Implementation of a generator.', '__init__': <function Generator.__init__>, 'load_model': <function Generator.load_model>, 'generate_case': <function Generator.generate_case>, 'format_output': <function Generator.format_output>, '__dict__': <attribute '__dict__' of 'Generator' objects>, '__weakref__': <attribute '__weakref__' of 'Generator' objects>, '__annotations__': {}})
__doc__ = 'Implementation of a generator.'
__module__ = 'gt4sd.algorithms.generation.pgt.implementation'
__weakref__

list of weak references to the object (if defined)

class PartGenerator(resources_path, input_text, model_type, model_name, task, max_length, top_k, top_p, num_return_sequences, no_repeat_ngram_size=2, device=None)[source]

Bases: Generator

Implementation of edit generator.

__init__(resources_path, input_text, model_type, model_name, task, max_length, top_k, top_p, num_return_sequences, no_repeat_ngram_size=2, device=None)[source]

PGT generation algorithm. :type resources_path: str :param resources_path: path to the cache. :type input_text: str :param input_text: input text for generation. :type task: str :param task: generation task. :type model_type: str :param model_type: type of the model. :type model_name: str :param model_name: name of the model weights/version. :type max_length: int :param max_length: max length of the generated text. :type top_k: int :param top_k: number of top-k probability token to keep. :type top_p: float :param top_p: only tokens with cumulative probabilities summing up to this value are kept. :type num_return_sequences: int :param num_return_sequences: number of generated sequences. :type no_repeat_ngram_size: int :param no_repeat_ngram_size: size of n-gram to not appear twice. :type device: Union[device, str, None] :param device: device where the inference

is running either as a dedicated class or a string. If not provided is inferred.

__annotations__ = {}
__doc__ = 'Implementation of edit generator.'
__module__ = 'gt4sd.algorithms.generation.pgt.implementation'
class EditGenerator(resources_path, input_text, model_type, model_name, max_length, top_k, top_p, num_return_sequences, no_repeat_ngram_size=2, device=None, input_type='abstract')[source]

Bases: Generator

Implementation of edit generator.

__init__(resources_path, input_text, model_type, model_name, max_length, top_k, top_p, num_return_sequences, no_repeat_ngram_size=2, device=None, input_type='abstract')[source]

PGT generation algorithm. :type resources_path: str :param resources_path: path to the cache. :type input_text: str :param input_text: input text for generation. :type model_type: str :param model_type: type of the model. :type model_name: str :param model_name: name of the model weights/version. :type max_length: int :param max_length: max length of the generated text. :type top_k: int :param top_k: number of top-k probability token to keep. :type top_p: float :param top_p: only tokens with cumulative probabilities summing up to this value are kept. :type num_return_sequences: int :param num_return_sequences: number of generated sequences. :type no_repeat_ngram_size: int :param no_repeat_ngram_size: size of n-gram to not appear twice. :type device: Union[device, str, None] :param device: device where the inference

is running either as a dedicated class or a string. If not provided is inferred.

Parameters

input_type (str) – part of a patent the input text belongs.

format_output(input_text, generated_sequences)[source]

Format output for the patent editing task.

Parameters
  • input_text (Union[str, Tuple[str]]) – generation input.

  • generated_sequences (List[str]) – generated sequences.

Return type

Union[List[str], List[Tuple[str, …]]]

Returns

formatted generated sequences.

__annotations__ = {}
__doc__ = 'Implementation of edit generator.'
__module__ = 'gt4sd.algorithms.generation.pgt.implementation'
class CoherenceCheckGenerator(resources_path, input_a, input_b, model_type, model_name, max_length, top_k, top_p, num_return_sequences, no_repeat_ngram_size=2, device=None, coherence_type='title-abstract')[source]

Bases: Generator

Implementation of coherence check generator.

__init__(resources_path, input_a, input_b, model_type, model_name, max_length, top_k, top_p, num_return_sequences, no_repeat_ngram_size=2, device=None, coherence_type='title-abstract')[source]

PGT generation algorithm. :type resources_path: str :param resources_path: path to the cache. :type input_a: str :param input_a: first input for coherence check. :type input_b: str :param input_b: second input for coherence check. :type model_type: str :param model_type: type of the model. :type model_name: str :param model_name: name of the model weights/version. :type max_length: int :param max_length: max length of the generated text. :type top_k: int :param top_k: number of top-k probability token to keep. :type top_p: float :param top_p: only tokens with cumulative probabilities summing up to this value are kept. :type num_return_sequences: int :param num_return_sequences: number of generated sequences. :type no_repeat_ngram_size: int :param no_repeat_ngram_size: size of n-gram to not appear twice. :type device: Union[device, str, None] :param device: device where the inference

is running either as a dedicated class or a string. If not provided is inferred.

Parameters

coherence_type (str) – input types for the check.

extract_coherence_types(coherence_type)[source]

Check the validity and extract coherence types of input text.

Parameters

coherence_type (str) – Input types of the coherence check.

Return type

Tuple[str, str]

Returns

tuple containing the type of the input.

format_output(input_text, generated_sequences)[source]

Format output for the patent coherence task.

Parameters
  • input_text (Union[str, Tuple[str]]) – generation input.

  • generated_sequences (List[str]) – generated sequences.

Return type

Union[List[str], List[Tuple[str, …]]]

Returns

formatted generated sequences.

__annotations__ = {}
__doc__ = 'Implementation of coherence check generator.'
__module__ = 'gt4sd.algorithms.generation.pgt.implementation'