gt4sd.algorithms.generation.pgt.implementation module¶

Implementation details for PGT algorithms.

Summary¶

Classes:

`CoherenceCheckGenerator`	Implementation of coherence check generator.
`EditGenerator`	Implementation of edit generator.
`Generator`	Implementation of a generator.
`PartGenerator`	Implementation of edit generator.

Functions:

adjust_length_to_model

Adjust sequence length.

Reference¶

adjust_length_to_model(length, maximum_sequence_length)[source]¶

Adjust sequence length. :type length: int :param length: target length. :type maximum_sequence_length: int :param maximum_sequence_length: maximum sequence length.

Returns: the adjusted length.

class Generator(resources_path, model_type, model_name, max_length, top_k, top_p, num_return_sequences, prompt='This is an interesting prompt', no_repeat_ngram_size=2, device=None)[source]¶

Bases: object

Implementation of a generator.

__init__(resources_path, model_type, model_name, max_length, top_k, top_p, num_return_sequences, prompt='This is an interesting prompt', no_repeat_ngram_size=2, device=None)[source]¶: PGT generation algorithm. :type resources_path: str :param resources_path: path to the cache. :type model_type: str :param model_type: type of the model. :type model_name: str :param model_name: name of the model weights/version. :type prompt: str :param prompt: prompt for text generation. :type max_length: int :param max_length: max length of the generated text. :type top_k: int :param top_k: number of top-k probability token to keep. :type top_p: float :param top_p: only tokens with cumulative probabilities summing up to this value are kept. :type num_return_sequences: int :param num_return_sequences: number of generated sequences. :type no_repeat_ngram_size: int :param no_repeat_ngram_size: size of n-gram to not appear twice. :type device: Union[device, str, None] :param device: device where the inference

is running either as a dedicated class or a string. If not provided is inferred.

load_model()[source]¶

Load a pretrained PGT model.

Return type: None

generate_case()[source]¶

Sample text snippets.

Return type: Union[List[str], List[Tuple[str, …]]]
Returns: generated text snippets.

format_output(input_text, generated_sequences)[source]¶

Format output. In the general case just return the generated sequences.

Parameters

input_text (Union[str, Tuple[str]]) – generation input.
generated_sequences (List[str]) – generated sequences.

Return type

Union[List[str], List[Tuple[str, …]]]

Returns

formatted generated sequences.

__dict__ = mappingproxy({'__module__': 'gt4sd.algorithms.generation.pgt.implementation', '__doc__': 'Implementation of a generator.', '__init__': <function Generator.__init__>, 'load_model': <function Generator.load_model>, 'generate_case': <function Generator.generate_case>, 'format_output': <function Generator.format_output>, '__dict__': <attribute '__dict__' of 'Generator' objects>, '__weakref__': <attribute '__weakref__' of 'Generator' objects>, '__annotations__': {}})¶

__doc__ = 'Implementation of a generator.'¶

__module__ = 'gt4sd.algorithms.generation.pgt.implementation'¶

__weakref__¶: list of weak references to the object (if defined)

class PartGenerator(resources_path, input_text, model_type, model_name, task, max_length, top_k, top_p, num_return_sequences, no_repeat_ngram_size=2, device=None)[source]¶

Bases: Generator

Implementation of edit generator.

__init__(resources_path, input_text, model_type, model_name, task, max_length, top_k, top_p, num_return_sequences, no_repeat_ngram_size=2, device=None)[source]¶: PGT generation algorithm. :type resources_path: str :param resources_path: path to the cache. :type input_text: str :param input_text: input text for generation. :type task: str :param task: generation task. :type model_type: str :param model_type: type of the model. :type model_name: str :param model_name: name of the model weights/version. :type max_length: int :param max_length: max length of the generated text. :type top_k: int :param top_k: number of top-k probability token to keep. :type top_p: float :param top_p: only tokens with cumulative probabilities summing up to this value are kept. :type num_return_sequences: int :param num_return_sequences: number of generated sequences. :type no_repeat_ngram_size: int :param no_repeat_ngram_size: size of n-gram to not appear twice. :type device: Union[device, str, None] :param device: device where the inference

is running either as a dedicated class or a string. If not provided is inferred.

__annotations__ = {}¶

__doc__ = 'Implementation of edit generator.'¶

__module__ = 'gt4sd.algorithms.generation.pgt.implementation'¶

class EditGenerator(resources_path, input_text, model_type, model_name, max_length, top_k, top_p, num_return_sequences, no_repeat_ngram_size=2, device=None, input_type='abstract')[source]¶

Bases: Generator

Implementation of edit generator.

__init__(resources_path, input_text, model_type, model_name, max_length, top_k, top_p, num_return_sequences, no_repeat_ngram_size=2, device=None, input_type='abstract')[source]¶

PGT generation algorithm. :type resources_path: str :param resources_path: path to the cache. :type input_text: str :param input_text: input text for generation. :type model_type: str :param model_type: type of the model. :type model_name: str :param model_name: name of the model weights/version. :type max_length: int :param max_length: max length of the generated text. :type top_k: int :param top_k: number of top-k probability token to keep. :type top_p: float :param top_p: only tokens with cumulative probabilities summing up to this value are kept. :type num_return_sequences: int :param num_return_sequences: number of generated sequences. :type no_repeat_ngram_size: int :param no_repeat_ngram_size: size of n-gram to not appear twice. :type device: Union[device, str, None] :param device: device where the inference

is running either as a dedicated class or a string. If not provided is inferred.

Parameters: input_type (str) – part of a patent the input text belongs.

format_output(input_text, generated_sequences)[source]¶

Format output for the patent editing task.

Parameters

input_text (Union[str, Tuple[str]]) – generation input.
generated_sequences (List[str]) – generated sequences.

Return type

Union[List[str], List[Tuple[str, …]]]

Returns

formatted generated sequences.

__annotations__ = {}¶

__doc__ = 'Implementation of edit generator.'¶

__module__ = 'gt4sd.algorithms.generation.pgt.implementation'¶

class CoherenceCheckGenerator(resources_path, input_a, input_b, model_type, model_name, max_length, top_k, top_p, num_return_sequences, no_repeat_ngram_size=2, device=None, coherence_type='title-abstract')[source]¶

Bases: Generator

Implementation of coherence check generator.

__init__(resources_path, input_a, input_b, model_type, model_name, max_length, top_k, top_p, num_return_sequences, no_repeat_ngram_size=2, device=None, coherence_type='title-abstract')[source]¶

PGT generation algorithm. :type resources_path: str :param resources_path: path to the cache. :type input_a: str :param input_a: first input for coherence check. :type input_b: str :param input_b: second input for coherence check. :type model_type: str :param model_type: type of the model. :type model_name: str :param model_name: name of the model weights/version. :type max_length: int :param max_length: max length of the generated text. :type top_k: int :param top_k: number of top-k probability token to keep. :type top_p: float :param top_p: only tokens with cumulative probabilities summing up to this value are kept. :type num_return_sequences: int :param num_return_sequences: number of generated sequences. :type no_repeat_ngram_size: int :param no_repeat_ngram_size: size of n-gram to not appear twice. :type device: Union[device, str, None] :param device: device where the inference

is running either as a dedicated class or a string. If not provided is inferred.

Parameters: coherence_type (str) – input types for the check.

extract_coherence_types(coherence_type)[source]¶

Check the validity and extract coherence types of input text.

Parameters: coherence_type (str) – Input types of the coherence check.
Return type: Tuple[str, str]
Returns: tuple containing the type of the input.

format_output(input_text, generated_sequences)[source]¶

Format output for the patent coherence task.

Parameters

input_text (Union[str, Tuple[str]]) – generation input.
generated_sequences (List[str]) – generated sequences.

Return type

Union[List[str], List[Tuple[str, …]]]

Returns

formatted generated sequences.

__annotations__ = {}¶

__doc__ = 'Implementation of coherence check generator.'¶

__module__ = 'gt4sd.algorithms.generation.pgt.implementation'¶