gt4sd.algorithms.generation.pgt.implementation module¶
Implementation details for PGT algorithms.
Summary¶
Classes:
Implementation of coherence check generator. |
|
Implementation of edit generator. |
|
Implementation of a generator. |
|
Implementation of edit generator. |
Functions:
Adjust sequence length. |
Reference¶
- adjust_length_to_model(length, maximum_sequence_length)[source]¶
Adjust sequence length. :type length:
int
:param length: target length. :type maximum_sequence_length:int
:param maximum_sequence_length: maximum sequence length.- Returns
the adjusted length.
- class Generator(resources_path, model_type, model_name, max_length, top_k, top_p, num_return_sequences, prompt='This is an interesting prompt', no_repeat_ngram_size=2, device=None)[source]¶
Bases:
object
Implementation of a generator.
- __init__(resources_path, model_type, model_name, max_length, top_k, top_p, num_return_sequences, prompt='This is an interesting prompt', no_repeat_ngram_size=2, device=None)[source]¶
PGT generation algorithm. :type resources_path:
str
:param resources_path: path to the cache. :type model_type:str
:param model_type: type of the model. :type model_name:str
:param model_name: name of the model weights/version. :type prompt:str
:param prompt: prompt for text generation. :type max_length:int
:param max_length: max length of the generated text. :type top_k:int
:param top_k: number of top-k probability token to keep. :type top_p:float
:param top_p: only tokens with cumulative probabilities summing up to this value are kept. :type num_return_sequences:int
:param num_return_sequences: number of generated sequences. :type no_repeat_ngram_size:int
:param no_repeat_ngram_size: size of n-gram to not appear twice. :type device:Union
[device
,str
,None
] :param device: device where the inferenceis running either as a dedicated class or a string. If not provided is inferred.
- generate_case()[source]¶
Sample text snippets.
- Return type
Union
[List
[str
],List
[Tuple
[str
, …]]]- Returns
generated text snippets.
- format_output(input_text, generated_sequences)[source]¶
Format output. In the general case just return the generated sequences.
- Parameters
input_text (
Union
[str
,Tuple
[str
]]) – generation input.generated_sequences (
List
[str
]) – generated sequences.
- Return type
Union
[List
[str
],List
[Tuple
[str
, …]]]- Returns
formatted generated sequences.
- __dict__ = mappingproxy({'__module__': 'gt4sd.algorithms.generation.pgt.implementation', '__doc__': 'Implementation of a generator.', '__init__': <function Generator.__init__>, 'load_model': <function Generator.load_model>, 'generate_case': <function Generator.generate_case>, 'format_output': <function Generator.format_output>, '__dict__': <attribute '__dict__' of 'Generator' objects>, '__weakref__': <attribute '__weakref__' of 'Generator' objects>, '__annotations__': {}})¶
- __doc__ = 'Implementation of a generator.'¶
- __module__ = 'gt4sd.algorithms.generation.pgt.implementation'¶
- __weakref__¶
list of weak references to the object (if defined)
- class PartGenerator(resources_path, input_text, model_type, model_name, task, max_length, top_k, top_p, num_return_sequences, no_repeat_ngram_size=2, device=None)[source]¶
Bases:
Generator
Implementation of edit generator.
- __init__(resources_path, input_text, model_type, model_name, task, max_length, top_k, top_p, num_return_sequences, no_repeat_ngram_size=2, device=None)[source]¶
PGT generation algorithm. :type resources_path:
str
:param resources_path: path to the cache. :type input_text:str
:param input_text: input text for generation. :type task:str
:param task: generation task. :type model_type:str
:param model_type: type of the model. :type model_name:str
:param model_name: name of the model weights/version. :type max_length:int
:param max_length: max length of the generated text. :type top_k:int
:param top_k: number of top-k probability token to keep. :type top_p:float
:param top_p: only tokens with cumulative probabilities summing up to this value are kept. :type num_return_sequences:int
:param num_return_sequences: number of generated sequences. :type no_repeat_ngram_size:int
:param no_repeat_ngram_size: size of n-gram to not appear twice. :type device:Union
[device
,str
,None
] :param device: device where the inferenceis running either as a dedicated class or a string. If not provided is inferred.
- __annotations__ = {}¶
- __doc__ = 'Implementation of edit generator.'¶
- __module__ = 'gt4sd.algorithms.generation.pgt.implementation'¶
- class EditGenerator(resources_path, input_text, model_type, model_name, max_length, top_k, top_p, num_return_sequences, no_repeat_ngram_size=2, device=None, input_type='abstract')[source]¶
Bases:
Generator
Implementation of edit generator.
- __init__(resources_path, input_text, model_type, model_name, max_length, top_k, top_p, num_return_sequences, no_repeat_ngram_size=2, device=None, input_type='abstract')[source]¶
PGT generation algorithm. :type resources_path:
str
:param resources_path: path to the cache. :type input_text:str
:param input_text: input text for generation. :type model_type:str
:param model_type: type of the model. :type model_name:str
:param model_name: name of the model weights/version. :type max_length:int
:param max_length: max length of the generated text. :type top_k:int
:param top_k: number of top-k probability token to keep. :type top_p:float
:param top_p: only tokens with cumulative probabilities summing up to this value are kept. :type num_return_sequences:int
:param num_return_sequences: number of generated sequences. :type no_repeat_ngram_size:int
:param no_repeat_ngram_size: size of n-gram to not appear twice. :type device:Union
[device
,str
,None
] :param device: device where the inferenceis running either as a dedicated class or a string. If not provided is inferred.
- Parameters
input_type (
str
) – part of a patent the input text belongs.
- format_output(input_text, generated_sequences)[source]¶
Format output for the patent editing task.
- Parameters
input_text (
Union
[str
,Tuple
[str
]]) – generation input.generated_sequences (
List
[str
]) – generated sequences.
- Return type
Union
[List
[str
],List
[Tuple
[str
, …]]]- Returns
formatted generated sequences.
- __annotations__ = {}¶
- __doc__ = 'Implementation of edit generator.'¶
- __module__ = 'gt4sd.algorithms.generation.pgt.implementation'¶
- class CoherenceCheckGenerator(resources_path, input_a, input_b, model_type, model_name, max_length, top_k, top_p, num_return_sequences, no_repeat_ngram_size=2, device=None, coherence_type='title-abstract')[source]¶
Bases:
Generator
Implementation of coherence check generator.
- __init__(resources_path, input_a, input_b, model_type, model_name, max_length, top_k, top_p, num_return_sequences, no_repeat_ngram_size=2, device=None, coherence_type='title-abstract')[source]¶
PGT generation algorithm. :type resources_path:
str
:param resources_path: path to the cache. :type input_a:str
:param input_a: first input for coherence check. :type input_b:str
:param input_b: second input for coherence check. :type model_type:str
:param model_type: type of the model. :type model_name:str
:param model_name: name of the model weights/version. :type max_length:int
:param max_length: max length of the generated text. :type top_k:int
:param top_k: number of top-k probability token to keep. :type top_p:float
:param top_p: only tokens with cumulative probabilities summing up to this value are kept. :type num_return_sequences:int
:param num_return_sequences: number of generated sequences. :type no_repeat_ngram_size:int
:param no_repeat_ngram_size: size of n-gram to not appear twice. :type device:Union
[device
,str
,None
] :param device: device where the inferenceis running either as a dedicated class or a string. If not provided is inferred.
- Parameters
coherence_type (
str
) – input types for the check.
- extract_coherence_types(coherence_type)[source]¶
Check the validity and extract coherence types of input text.
- Parameters
coherence_type (
str
) – Input types of the coherence check.- Return type
Tuple
[str
,str
]- Returns
tuple containing the type of the input.
- format_output(input_text, generated_sequences)[source]¶
Format output for the patent coherence task.
- Parameters
input_text (
Union
[str
,Tuple
[str
]]) – generation input.generated_sequences (
List
[str
]) – generated sequences.
- Return type
Union
[List
[str
],List
[Tuple
[str
, …]]]- Returns
formatted generated sequences.
- __annotations__ = {}¶
- __doc__ = 'Implementation of coherence check generator.'¶
- __module__ = 'gt4sd.algorithms.generation.pgt.implementation'¶